Ubuntu Manpage: pg_auto_failover - pg_auto

Provided by: pg-auto-failover-cli_1.6.3-1_amd64

NAME

       pg_auto_failover - pg_auto_failover Documentation

INTRODUCTION TO PG_AUTO_FAILOVER

pg_auto_failover is an extension for PostgreSQL that monitors and manages failover for a postgres
clusters. It is optimised for simplicity and correctness.

Single Standby Architecture
[image: pg_auto_failover Architecture with a primary and a standby node] [image] pg_auto_failover
architecture with a primary and a standby node.UNINDENT

pg_auto_failover implements Business Continuity for your PostgreSQL services. pg_auto_failover
implements a single PostgreSQL service using multiple nodes with automated failover, and automates
PostgreSQL maintenance operations in a way that guarantees availability of the service to its users and
applications.

To that end, pg_auto_failover uses three nodes (machines, servers) per PostgreSQL service:

• a PostgreSQL primary node,

• a PostgreSQL secondary node, using Synchronous Hot Standby,

• a pg_auto_failover Monitor node that acts both as a witness and an orchestrator.

The pg_auto_failover Monitor implements a state machine and relies on in-core PostgreSQL facilities to
deliver HA. For example. when the secondary node is detected to be unavailable, or when its lag is
reported above a defined threshold (the default is 1 WAL files, or 16MB, see the
pgautofailover.promote_wal_log_threshold GUC on the pg_auto_failover monitor), then the Monitor removes
it from the synchronous_standby_names setting on the primary node. Until the secondary is back to being
monitored healthy, failover and switchover operations are not allowed, preventing data loss.

Multiple Standby Architecture
[image: pg_auto_failover Architecture for a standalone PostgreSQL service] [image] pg_auto_failover
architecture with a primary and two standby nodes.UNINDENT

In the pictured architecture, pg_auto_failover implements Business Continuity and data availability by
implementing a single PostgreSQL service using multiple with automated failover and data redundancy.
Even after losing any Postgres node in a production system, this architecture maintains two copies of
the data on two different nodes.

When using more than one standby, different architectures can be achieved with pg_auto_failover,
depending on the objectives and trade-offs needed for your production setup.

Multiple Standbys Architecture with 3 standby nodes, one async
[image: pg_auto_failover architecture with a primary and three standby nodes] [image] pg_auto_failover
architecture with a primary and three standby nodes.UNINDENT

When setting the three parameters above, it's possible to design very different Postgres architectures
for your production needs.

In this case, the system is setup with two standby nodes participating in the replication quorum,
allowing for number_sync_standbys = 1. The system always maintains a minimum of two copies of the data
set: one on the primary, another one on one on either node B or node D. Whenever we lose one of those
nodes, we can hold to this guarantee of two copies of the data set.

Adding to that, we have the standby server C which has been set up to not participate in the
replication quorum. Node C will not be found in the synchronous_standby_names list of nodes. Also, node
C is set up in a way to never be a candidate for failover, with candidate-priority = 0.

This architecture would fit a situation where nodes A, B, and D are deployed in the same data center or
availability zone, and node C in another. Those three nodes are set up to support the main production
traffic and implement high availability of both the Postgres service and the data set.

Node C might be set up for Business Continuity in case the first data center is lost, or maybe for
reporting the need for deployment on another application domain.

MAIN PG_AUTOCTL COMMANDS

       pg_auto_failover  includes  the command line tool pg_autoctl that implements many commands to manage your
       Postgres nodes. To implement the Postgres architectures described in this documentation, and more, it  is
       generally possible to use only some of the many pg_autoctl commands.

       This  section  of  the  documentation  is  a short introduction to the main commands that are useful when
       getting started with pg_auto_failover. More commands are available  and  help  deal  with  a  variety  of
       situations, see the manual for the whole list.

       To  understand  which  replication settings to use in your case, see architecture_basics section and then
       the multi_node_architecture section.

       To follow a step by step guide that you can reproduce  on  your  own  Azure  subscription  and  create  a
       production Postgres setup from VMs, see the tutorial section.

       To  understand how to setup pg_auto_failover in a way that is compliant with your internal security guide
       lines, read the security section.

   Command line environment, configuration files, etc
       As a command line tool pg_autoctl depends on some environment variables.  Mostly, the  tool  re-uses  the
       Postgres environment variables that you might already know.

       To  manage  a Postgres node pg_auto_failover needs to know its data directory location on-disk. For that,
       some users will find it easier to export the  PGDATA  variable  in  their  environment.  The  alternative
       consists of always using the --pgdata option that is available to all the pg_autoctl commands.

   Creating Postgres Nodes
       To  get  started  with  the  simplest  Postgres  failover setup, 3 nodes are needed: the pg_auto_failover
       monitor, and 2 Postgres nodes that will get assigned roles by the monitor.  One  Postgres  node  will  be
       assigned the primary role, the other one will get assigned the secondary role.

       To create the monitor use the command:

          $ pg_autoctl create monitor

       The create the Postgres nodes use the following command on each node you want to create:

          $ pg_autoctl create postgres

       While those create commands initialize your nodes, now you have to actually run the Postgres service that
       are expected to be running. For that you can manually run the following command on every node:

          $ pg_autoctl run

       It  is  also  possible  (and recommended) to integrate the pg_auto_failover service in your usual service
       management facility. When using systemd the following commands can be  used  to  produce  the  unit  file
       configuration required:

          $ pg_autoctl show systemd
          INFO  HINT: to complete a systemd integration, run the following commands:
          INFO  pg_autoctl -q show systemd --pgdata "/tmp/pgaf/m" | sudo tee /etc/systemd/system/pgautofailover.service
          INFO  sudo systemctl daemon-reload
          INFO  sudo systemctl enable pgautofailover
          INFO  sudo systemctl start pgautofailover
          [Unit]
          ...

       While  it  is expected that for a production deployment each node actually is a separate machine (virtual
       or physical, or even a container), it is also possible to run several Postgres  nodes  all  on  the  same
       machine for testing or development purposes.

       TIP:
          When   running  several  pg_autoctl  nodes  on  the  same  machine  for  testing  or  contributing  to
          pg_auto_failover, each Postgres instance needs to  run  on  its  own  port,  and  with  its  own  data
          directory.  It can make things easier to then set the environement variables PGDATA and PGPORT in each
          terminal, shell, or tab where each instance is started.

   Inspecting nodes
       Once your Postgres nodes have been created, and once each pg_autoctl service is running, it  is  possible
       to inspect the current state of the formation with the following command:

          $ pg_autoctl show state

       The  pg_autoctl show state commands outputs the current state of the system only once. Sometimes it would
       be nice to have an auto-updated display such as provided by common tools such as watch(1) or  top(1)  and
       the like. For that, the following commands are available (see also pg_autoctl_watch):

          $ pg_autoctl watch
          $ pg_autoctl show state --watch

       To  analyze  what's  been happening to get to the current state, it is possible to review the past events
       generated by the pg_auto_failover monitor with the following command:

          $ pg_autoctl show events

       HINT:
          The pg_autoctl show commands can be run from any node in your system.  Those command need  to  connect
          to the monitor and print the current state or the current known list of events as per the monitor view
          of the system.

          Use pg_autoctl show state --local to have a view of the local state of a given node without connecting
          to the monitor Postgres instance.

          The option --json is available in most pg_autoctl commands and switches the output format from a human
          readable table form to a program friendly JSON pretty-printed output.

   Inspecting and Editing Replication Settings
       When  creating a node it is possible to use the --candidate-priority and the --replication-quorum options
       to set the replication properties as required by your choice of Postgres architecture.

       To review the current replication settings of a formation, use one of the two following  commands,  which
       are convenient aliases (the same command with two ways to invoke it):

          $ pg_autoctl show settings
          $ pg_autoctl get formation settings

       It  is  also  possible to edit those replication settings at any time while your nodes are in production:
       you can change your mind or adjust to new elements without having to re-deploy everything. Just  use  the
       following commands to adjust the replication settings on the fly:

          $ pg_autoctl set formation number-sync-standbys
          $ pg_autoctl set node replication-quorum
          $ pg_autoctl set node candidate-priority

       IMPORTANT:
          The pg_autoctl get and pg_autoctl set commands always connect to the monitor Postgres instance.

          The  pg_autoctl  set  command  then  changes  the replication settings on the node registration on the
          monitor. Then the monitor assigns the APPLY_SETTINGS state to the current primary node in  the  system
          for it to apply the new replication settings to its Postgres streaming replication setup.

          As  a  result,  the  pg_autoctl  set  commands  requires a stable state in the system to be allowed to
          proceed. Namely, the current primary node in the system must have  both  its  Current  State  and  its
          Assigned State set to primary, as per the pg_autoctl show state output.

   Implementing Maintenance Operations
       When  a  Postgres  node must be taken offline for a maintenance operation, such as e.g. a kernel security
       upgrade or a minor Postgres update, it is best to make it so  that  the  pg_auto_failover  monitor  knows
       about it.

          • For  one  thing, a node that is known to be in maintenance does not participate in failovers. If you
            are running with two Postgres nodes, then failover  operations  are  entirely  prevented  while  the
            standby node is in maintenance.

          • Moreover,  depending on your replication settings, enabling maintenance on your standby ensures that
            the primary node switches to async replication before  Postgres  is  shut  down  on  the  secondary,
            avoiding write queries to be blocked.

       To implement maintenance operations, use the following commands:

          $ pg_autoctl enable maintenance
          $ pg_autoctl disable maintenance

       The  main  pg_autoctl run service that is expected to be running in the background should continue to run
       during the whole maintenance operation.  When a node is in the maintenance state, the pg_autoctl  service
       is not controlling the Postgres service anymore.

       Note  that  it  is  possible  to  enable  maintenance on a primary Postgres node, and that operation then
       requires a failover to happen first. It is possible to have pg_auto_failover  orchestrate  that  for  you
       when using the command:

          $ pg_autoctl enable maintenance --allow-failover

       IMPORTANT:
          The  pg_autoctl  enable  and  pg_autoctl  disable commands requires a stable state in the system to be
          allowed to proceed. Namely, the current primary node in the system must have both  its  Current  State
          and its Assigned State set to primary, as per the pg_autoctl show state output.

   Manual failover, switchover, and promotions
       In  the  cases  when  a  failover  is  needed without having an actual node failure, the pg_auto_failover
       monitor can be used to orchestrate the operation. Use one of the following commands, which  are  synonyms
       in the pg_auto_failover design:

          $ pg_autoctl perform failover
          $ pg_autoctl perform switchover

       Finally, it is also possible to “elect” a new primary node in your formation with the command:

          $ pg_autoctl perform promotion

       IMPORTANT:
          The  pg_autoctl  perform  commands  requires  a  stable  state in the system to be allowed to proceed.
          Namely, the current primary node in the system must have both its Current State and its Assigned State
          set to primary, as per the pg_autoctl show state output.

   What's next?
       This section of the documentation is meant to help users get started by focusing on the main commands  of
       the  pg_autoctl tool. Each command has many options that can have very small impact, or pretty big impact
       in terms of security or architecture. Read the rest of the manual to understand how to best use the  many
       pg_autoctl options to implement your specific Postgres production architecture.

PG_AUTO_FAILOVER TUTORIAL

       In this guide we’ll create a primary and secondary Postgres node and set up pg_auto_failover to replicate
       data  between  them.  We’ll simulate failure in the primary node and see how the system smoothly switches
       (fails over) to the secondary.

       For illustration, we'll run our databases on virtual machines in the Azure platform, but  the  techniques
       here are relevant to any cloud provider or on-premise network. We'll use four virtual machines: a primary
       database,  a  secondary  database,  a monitor, and an "application." The monitor watches the other nodes’
       health, manages global state, and assigns nodes their roles.

   Create virtual network
       Our database machines need to talk to each other and to the monitor  node,  so  let's  create  a  virtual
       network.

          az group create \
              --name ha-demo \
              --location eastus

          az network vnet create \
              --resource-group ha-demo \
              --name ha-demo-net \
              --address-prefix 10.0.0.0/16

       We  need  to open ports 5432 (Postgres) and 22 (SSH) between the machines, and also give ourselves access
       from our remote IP. We'll do this with a network security group and a subnet.

          az network nsg create \
              --resource-group ha-demo \
              --name ha-demo-nsg

          az network nsg rule create \
              --resource-group ha-demo \
              --nsg-name ha-demo-nsg \
              --name ha-demo-ssh-and-pg \
              --access allow \
              --protocol Tcp \
              --direction Inbound \
              --priority 100 \
              --source-address-prefixes `curl ifconfig.me` 10.0.1.0/24 \
              --source-port-range "*" \
              --destination-address-prefix "*" \
              --destination-port-ranges 22 5432

          az network vnet subnet create \
              --resource-group ha-demo \
              --vnet-name ha-demo-net \
              --name ha-demo-subnet \
              --address-prefixes 10.0.1.0/24 \
              --network-security-group ha-demo-nsg

       Finally add four virtual machines (ha-demo-a, ha-demo-b, ha-demo-monitor, and ha-demo-app). For speed  we
       background the az vm create processes and run them in parallel:

          # create VMs in parallel
          for node in monitor a b app
          do
          az vm create \
              --resource-group ha-demo \
              --name ha-demo-${node} \
              --vnet-name ha-demo-net \
              --subnet ha-demo-subnet \
              --nsg ha-demo-nsg \
              --public-ip-address ha-demo-${node}-ip \
              --image debian \
              --admin-username ha-admin \
              --generate-ssh-keys &
          done
          wait

       To make it easier to SSH into these VMs in future steps, let's make a shell function to retrieve their IP
       addresses:

          # run this in your local shell as well

          vm_ip () {
            az vm list-ip-addresses -g ha-demo -n ha-demo-$1 -o tsv \
              --query '[] [] .virtualMachine.network.publicIpAddresses[0].ipAddress'
          }

          # for convenience with ssh

          for node in monitor a b app
          do
          ssh-keyscan -H `vm_ip $node` >> ~/.ssh/known_hosts
          done

       Let's review what we created so far.

          az resource list --output table --query \
            "[?resourceGroup=='ha-demo'].{ name: name, flavor: kind, resourceType: type, region: location }"

       This shows the following resources:

          Name                             ResourceType                                           Region
          -------------------------------  -----------------------------------------------------  --------
          ha-demo-a                        Microsoft.Compute/virtualMachines                      eastus
          ha-demo-app                      Microsoft.Compute/virtualMachines                      eastus
          ha-demo-b                        Microsoft.Compute/virtualMachines                      eastus
          ha-demo-monitor                  Microsoft.Compute/virtualMachines                      eastus
          ha-demo-appVMNic                 Microsoft.Network/networkInterfaces                    eastus
          ha-demo-aVMNic                   Microsoft.Network/networkInterfaces                    eastus
          ha-demo-bVMNic                   Microsoft.Network/networkInterfaces                    eastus
          ha-demo-monitorVMNic             Microsoft.Network/networkInterfaces                    eastus
          ha-demo-nsg                      Microsoft.Network/networkSecurityGroups                eastus
          ha-demo-a-ip                     Microsoft.Network/publicIPAddresses                    eastus
          ha-demo-app-ip                   Microsoft.Network/publicIPAddresses                    eastus
          ha-demo-b-ip                     Microsoft.Network/publicIPAddresses                    eastus
          ha-demo-monitor-ip               Microsoft.Network/publicIPAddresses                    eastus
          ha-demo-net                      Microsoft.Network/virtualNetworks                      eastus

   Install the pg_autoctl executable
       This  guide  uses  Debian Linux, but similar steps will work on other distributions. All that differs are
       the packages and paths. See install.

       The pg_auto_failover system is distributed as a single pg_autoctl binary with subcommands  to  initialize
       and  manage  a replicated PostgreSQL service.  We’ll install the binary with the operating system package
       manager on all nodes. It will help us run and observe PostgreSQL.

          for node in monitor a b app
          do
          az vm run-command invoke \
             --resource-group ha-demo \
             --name ha-demo-${node} \
             --command-id RunShellScript \
             --scripts \
                "sudo touch /home/ha-admin/.hushlogin" \
                "curl https://install.citusdata.com/community/deb.sh | sudo bash" \
                "sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y postgresql-common" \
                "echo 'create_main_cluster = false' | sudo tee -a /etc/postgresql-common/createcluster.conf" \
                "sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y postgresql-11-auto-failover-1.4" \
                "sudo usermod -a -G postgres ha-admin" &
          done
          wait

   Run a monitor
       The pg_auto_failover monitor is the first component to run. It periodically attempts to contact the other
       nodes and watches their health. It also maintains global state that “keepers” on  each  node  consult  to
       determine their own roles in the system.

          # on the monitor virtual machine

          ssh -l ha-admin `vm_ip monitor` -- \
            pg_autoctl create monitor \
              --auth trust \
              --ssl-self-signed \
              --pgdata monitor \
              --pgctl /usr/lib/postgresql/11/bin/pg_ctl

       This  command  initializes  a  PostgreSQL  cluster  at  the location pointed by the --pgdata option. When
       --pgdata is omitted, pg_autoctl attempts to use the PGDATA environment variable. If a PostgreSQL instance
       had already existing in the destination directory, this command would have configured it to  serve  as  a
       monitor.

       pg_auto_failover, installs the pgautofailover Postgres extension, and grants access to a new autoctl_node
       user.

       In  the  Quick  Start  we  use  --auth  trust  to  avoid  complex  security settings.  The Postgres trust
       authentication method is not considered a reasonable choice for production environments. Consider  either
       using the --skip-pg-hba option or --auth scram-sha-256 and then setting up passwords yourself.

       At  this  point  the  monitor  is created. Now we'll install it as a service with systemd so that it will
       resume if the VM restarts.

          ssh -T -l ha-admin `vm_ip monitor` << CMD
            pg_autoctl -q show systemd --pgdata ~ha-admin/monitor > pgautofailover.service
            sudo mv pgautofailover.service /etc/systemd/system
            sudo systemctl daemon-reload
            sudo systemctl enable pgautofailover
            sudo systemctl start pgautofailover
          CMD

   Bring up the nodes
       We’ll create the primary database using the pg_autoctl create subcommand.

          ssh -l ha-admin `vm_ip a` -- \
            pg_autoctl create postgres \
              --pgdata ha \
              --auth trust \
              --ssl-self-signed \
              --username ha-admin \
              --dbname appdb \
              --hostname ha-demo-a.internal.cloudapp.net \
              --pgctl /usr/lib/postgresql/11/bin/pg_ctl \
              --monitor 'postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net/pg_auto_failover?sslmode=require'

       Notice the user and database name in the monitor  connection  string  --  these  are  what  monitor  init
       created.  We also give it the path to pg_ctl so that the keeper will use the correct version of pg_ctl in
       future even if other versions of postgres are installed on the system.

       In the example above, the keeper creates a primary database. It chooses to  set  up  node  A  as  primary
       because  the  monitor  reports there are no other nodes in the system yet. This is one example of how the
       keeper is state-based: it makes observations and then adjusts its state, in  this  case  from  "init"  to
       "single."

       Also add a setting to trust connections from our "application" VM:

          ssh -T -l ha-admin `vm_ip a` << CMD
            echo 'hostssl "appdb" "ha-admin" ha-demo-app.internal.cloudapp.net trust' \
              >> ~ha-admin/ha/pg_hba.conf
          CMD

       At  this point the monitor and primary node are created and running. Next we need to run the keeper. It’s
       an independent process so that it can continue operating even if the PostgreSQL process  goes  terminates
       on the node. We'll install it as a service with systemd so that it will resume if the VM restarts.

          ssh -T -l ha-admin `vm_ip a` << CMD
            pg_autoctl -q show systemd --pgdata ~ha-admin/ha > pgautofailover.service
            sudo mv pgautofailover.service /etc/systemd/system
            sudo systemctl daemon-reload
            sudo systemctl enable pgautofailover
            sudo systemctl start pgautofailover
          CMD

       Next connect to node B and do the same process. We'll do both steps at once:

          ssh -l ha-admin `vm_ip b` -- \
            pg_autoctl create postgres \
              --pgdata ha \
              --auth trust \
              --ssl-self-signed \
              --username ha-admin \
              --dbname appdb \
              --hostname ha-demo-b.internal.cloudapp.net \
              --pgctl /usr/lib/postgresql/11/bin/pg_ctl \
              --monitor 'postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net/pg_auto_failover?sslmode=require'

          ssh -T -l ha-admin `vm_ip b` << CMD
            pg_autoctl -q show systemd --pgdata ~ha-admin/ha > pgautofailover.service
            sudo mv pgautofailover.service /etc/systemd/system
            sudo systemctl daemon-reload
            sudo systemctl enable pgautofailover
            sudo systemctl start pgautofailover
          CMD

       It  discovers from the monitor that a primary exists, and then switches its own state to be a hot standby
       and begins streaming WAL contents from the primary.

   Node communication
       For convenience, pg_autoctl modifies each node's pg_hba.conf file to allow the nodes to  connect  to  one
       another. For instance, pg_autoctl added the following lines to node A:

          # automatically added to node A

          hostssl "appdb" "ha-admin" ha-demo-a.internal.cloudapp.net trust
          hostssl replication "pgautofailover_replicator" ha-demo-b.internal.cloudapp.net trust
          hostssl "appdb" "pgautofailover_replicator" ha-demo-b.internal.cloudapp.net trust

       For  pg_hba.conf on the monitor node pg_autoctl inspects the local network and makes its best guess about
       the subnet to allow. In our case it guessed correctly:

          # automatically added to the monitor

          hostssl "pg_auto_failover" "autoctl_node" 10.0.1.0/24 trust

       If worker nodes have more ad-hoc addresses and are not  in  the  same  subnet,  it's  better  to  disable
       pg_autoctl's  automatic  modification  of  pg_hba  using  the  --skip-pg-hba  command  line option during
       creation. You will then need to edit the hba file by hand. Another reason for manual edits  would  be  to
       use special authentication methods.

   Watch the replication
       First let’s verify that the monitor knows about our nodes, and see what states it has assigned them:

          ssh -l ha-admin `vm_ip monitor` pg_autoctl show state --pgdata monitor

            Name |  Node |                            Host:Port |       LSN | Reachable |       Current State |      Assigned State
          -------+-------+--------------------------------------+-----------+-----------+---------------------+--------------------
          node_1 |     1 | ha-demo-a.internal.cloudapp.net:5432 | 0/3000060 |       yes |             primary |             primary
          node_2 |     2 | ha-demo-b.internal.cloudapp.net:5432 | 0/3000060 |       yes |           secondary |           secondary

       This  looks good. We can add data to the primary, and later see it appear in the secondary. We'll connect
       to the database from inside our "app" virtual machine,  using  a  connection  string  obtained  from  the
       monitor.

          ssh -l ha-admin `vm_ip monitor` pg_autoctl show uri --pgdata monitor

                Type |    Name | Connection String
          -----------+---------+-------------------------------
             monitor | monitor | postgres://autoctl_node@ha-demo-monitor.internal.cloudapp.net:5432/pg_auto_failover?sslmode=require
           formation | default | postgres://ha-demo-b.internal.cloudapp.net:5432,ha-demo-a.internal.cloudapp.net:5432/appdb?target_session_attrs=read-write&sslmode=require

       Now we'll get the connection string and store it in a local environment variable:

          APP_DB_URI=$( \
            ssh -l ha-admin `vm_ip monitor` \
              pg_autoctl show uri --formation default --pgdata monitor \
          )

       The  connection  string  contains  both  our  nodes,  comma  separated,  and  includes  the url parameter
       ?target_session_attrs=read-write telling psql that we want to  connect  to  whichever  of  these  servers
       supports reads and writes.  That will be the primary server.

          # connect to database via psql on the app vm and
          # create a table with a million rows
          ssh -l ha-admin -t `vm_ip app` -- \
            psql "'$APP_DB_URI'" \
              -c "'CREATE TABLE foo AS SELECT generate_series(1,1000000) bar;'"

   Cause a failover
       Now  that  we've  added  data  to  node  A,  let's  switch  which is considered the primary and which the
       secondary. After the switch we'll connect again and query the data, this time from node B.

          # initiate failover to node B
          ssh -l ha-admin -t `vm_ip monitor` \
            pg_autoctl perform switchover --pgdata monitor

       Once node B is marked "primary" (or "wait_primary") we can connect and verify  that  the  data  is  still
       present:

          # connect to database via psql on the app vm
          ssh -l ha-admin -t `vm_ip app` -- \
            psql "'$APP_DB_URI'" \
              -c "'SELECT count(*) FROM foo;'"

       It shows

            count
          ---------
           1000000

   Cause a node failure
       This plot is too boring, time to introduce a problem. We’ll turn off VM for node B (currently the primary
       after our previous failover) and watch node A get promoted.

       In one terminal let’s keep an eye on events:

          ssh -t -l ha-admin `vm_ip monitor` -- \
            watch -n 1 -d pg_autoctl show state --pgdata monitor

       In another terminal we’ll turn off the virtual server.

          az vm stop \
            --resource-group ha-demo \
            --name ha-demo-b

       After  a  number  of  failed attempts to talk to node B, the monitor determines the node is unhealthy and
       puts it into the "demoted" state.  The monitor promotes node A to be the new primary.

            Name |  Node |                            Host:Port |       LSN | Reachable |       Current State |      Assigned State
          -------+-------+--------------------------------------+-----------+-----------+---------------------+--------------------
          node_1 |     1 | ha-demo-a.internal.cloudapp.net:5432 | 0/6D4E068 |       yes |        wait_primary |        wait_primary
          node_2 |     2 | ha-demo-b.internal.cloudapp.net:5432 | 0/6D4E000 |       yes |             demoted |          catchingup

       Node A cannot be considered in full "primary" state since there is no secondary present, but it can still
       serve client requests. It is marked as "wait_primary" until a secondary appears, to  indicate  that  it's
       running without a backup.

       Let's add some data while B is offline.

          # notice how $APP_DB_URI continues to work no matter which node
          # is serving as primary
          ssh -l ha-admin -t `vm_ip app` -- \
            psql "'$APP_DB_URI'" \
              -c "'INSERT INTO foo SELECT generate_series(1000001, 2000000);'"

   Resurrect node B
       Run this command to bring node B back online:

          az vm start \
            --resource-group ha-demo \
            --name ha-demo-b

       Now  the next time the keeper retries its health check, it brings the node back.  Node B goes through the
       state "catchingup" while it updates its data to match A. Once that's done, B becomes a secondary,  and  A
       is now a full primary again.

            Name |  Node |                            Host:Port |        LSN | Reachable |       Current State |      Assigned State
          -------+-------+--------------------------------------+------------+-----------+---------------------+--------------------
          node_1 |     1 | ha-demo-a.internal.cloudapp.net:5432 | 0/12000738 |       yes |             primary |             primary
          node_2 |     2 | ha-demo-b.internal.cloudapp.net:5432 | 0/12000738 |       yes |           secondary |           secondary

       What's more, if we connect directly to the database again, all two million rows are still present.

          ssh -l ha-admin -t `vm_ip app` -- \
            psql "'$APP_DB_URI'" \
              -c "'SELECT count(*) FROM foo;'"

       It shows

            count
          ---------
           2000000

ARCHITECTURE BASICS

pg_auto_failover is designed as a simple and robust way to manage automated Postgres failover in
production. On-top of robust operations, pg_auto_failover setup is flexible and allows either Business
Continuity or High Availability configurations. pg_auto_failover design includes configuration changes in
a live system without downtime.

pg_auto_failover is designed to be able to handle a single PostgreSQL service using three nodes. In this
setting, the system is resilient to losing any one of three nodes.
[image: pg_auto_failover Architecture for a standalone PostgreSQL service] [image] pg_auto_failover
Architecture for a standalone PostgreSQL service.UNINDENT

It is important to understand that when using only two Postgres nodes then pg_auto_failover is
optimized for Business Continuity. In the event of losing a single node, pg_auto_failover is capable of
continuing the PostgreSQL service, and prevents any data loss when doing so, thanks to PostgreSQL
Synchronous Replication.

That said, there is a trade-off involved in this architecture. The business continuity bias relaxes
replication guarantees for asynchronous replication in the event of a standby node failure. This allows
the PostgreSQL service to accept writes when there's a single server available, and opens the service
for potential data loss if the primary server were also to fail.

The pg_auto_failover Monitor
Each PostgreSQL node in pg_auto_failover runs a Keeper process which informs a central Monitor node about
notable local changes. Some changes require the Monitor to orchestrate a correction across the cluster:

• New nodes

At initialization time, it's necessary to prepare the configuration of each node for PostgreSQL
streaming replication, and get the cluster to converge to the nominal state with both a primary and
a secondary node in each group. The monitor determines each new node's role

• Node failure

The monitor orchestrates a failover when it detects an unhealthy node. The design of
pg_auto_failover allows the monitor to shut down service to a previously designated primary node
without causing a "split-brain" situation.

The monitor is the authoritative node that manages global state and makes changes in the cluster by
issuing commands to the nodes' keeper processes. A pg_auto_failover monitor node failure has limited
impact on the system. While it prevents reacting to other nodes' failures, it does not affect
replication. The PostgreSQL streaming replication setup installed by pg_auto_failover does not depend on
having the monitor up and running.

pg_auto_failover Glossary
pg_auto_failover handles a single PostgreSQL service with the following concepts:

Monitor
The pg_auto_failover monitor is a service that keeps track of one or several formations containing groups
of nodes.

The monitor is implemented as a PostgreSQL extension, so when you run the command pg_autoctl create
monitor a PostgreSQL instance is initialized, configured with the extension, and started. The monitor
service embeds a PostgreSQL instance.

Formation
A formation is a logical set of PostgreSQL services that are managed together.

It is possible to operate many formations with a single monitor instance. Each formation has a group of
Postgres nodes and the FSM orchestration implemented by the monitor applies separately to each group.

Group
A group of two PostgreSQL nodes work together to provide a single PostgreSQL service in a Highly
Available fashion. A group consists of a PostgreSQL primary server and a secondary server setup with Hot
Standby synchronous replication. Note that pg_auto_failover can orchestrate the whole setting-up of the
replication for you.

In pg_auto_failover versions up to 1.3, a single Postgres group can contain only two Postgres nodes.
Starting with pg_auto_failover 1.4, there's no limit to the number of Postgres nodes in a single group.
Note that each Postgres instance that belongs to the same group serves the same dataset in its data
directory (PGDATA).

NOTE:
The notion of a formation that contains multiple groups in pg_auto_failover is useful when setting up
and managing a whole Citus formation, where the coordinator nodes belong to group zero of the
formation, and each Citus worker node becomes its own group and may have Postgres standby nodes.

Keeper
The pg_auto_failover keeper is an agent that must be running on the same server where your PostgreSQL
nodes are running. The keeper controls the local PostgreSQL instance (using both the pg_ctl command-line
tool and SQL queries), and communicates with the monitor:

• it sends updated data about the local node, such as the WAL delta in between servers, measured via
PostgreSQL statistics views.

• it receives state assignments from the monitor.

Also the keeper maintains local state that includes the most recent communication established with the
monitor and the other PostgreSQL node of its group, enabling it to detect network_partitions.

NOTE:
In pg_auto_failover versions up to and including 1.3, the keeper process started with pg_autoctl run
manages a separate Postgres instance, running as its own process tree.

Starting in pg_auto_failover version 1.4, the keeper process (started with pg_autoctl run) runs the
Postgres instance as a sub-process of the main pg_autoctl process, allowing tighter control over the
Postgres execution. Running the sub-process also makes the solution work better both in container
environments (because it's now a single process tree) and with systemd, because it uses a specific
cgroup per service unit.

Node
A node is a server (virtual or physical) that runs PostgreSQL instances and a keeper service. At any
given time, any node might be a primary or a secondary Postgres instance. The whole point of
pg_auto_failover is to decide this state.

As a result, refrain from naming your nodes with the role you intend for them. Their roles can change.
If they didn't, your system wouldn't need pg_auto_failover!

State
A state is the representation of the per-instance and per-group situation. The monitor and the keeper
implement a Finite State Machine to drive operations in the PostgreSQL groups; allowing pg_auto_failover
to implement High Availability with the goal of zero data loss.

The keeper main loop enforces the current expected state of the local PostgreSQL instance, and reports
the current state and some more information to the monitor. The monitor uses this set of information and
its own health-check information to drive the State Machine and assign a goal state to the keeper.

The keeper implements the transitions between a current state and a monitor-assigned goal state.

Client-side HA
Implementing client-side High Availability is included in PostgreSQL's driver libpq from version 10
onward. Using this driver, it is possible to specify multiple host names or IP addresses in the same
connection string:

$ psql -d "postgresql://host1,host2/dbname?target_session_attrs=read-write"
$ psql -d "postgresql://host1:port2,host2:port2/dbname?target_session_attrs=read-write"
$ psql -d "host=host1,host2 port=port1,port2 target_session_attrs=read-write"

When using either of the syntax above, the psql application attempts to connect to host1, and when
successfully connected, checks the target_session_attrs as per the PostgreSQL documentation of it:
If this parameter is set to read-write, only a connection in which read-write transactions are
accepted by default is considered acceptable. The query SHOW transaction_read_only will be sent upon
any successful connection; if it returns on, the connection will be closed. If multiple hosts were
specified in the connection string, any remaining servers will be tried just as if the connection
attempt had failed. The default value of this parameter, any, regards all connections as acceptable.

When the connection attempt to host1 fails, or when the target_session_attrs can not be verified, then
the psql application attempts to connect to host2.

The behavior is implemented in the connection library libpq, so any application using it can benefit from
this implementation, not just psql.

When using pg_auto_failover, configure your application connection string to use the primary and the
secondary server host names, and set target_session_attrs=read-write too, so that your application
automatically connects to the current primary, even after a failover occurred.

Monitoring protocol
The monitor interacts with the data nodes in 2 ways:

• Data nodes periodically connect and run SELECT pgautofailover.node_active(...) to communicate their
current state and obtain their goal state.

• The monitor periodically connects to all the data nodes to see if they are healthy, doing the
equivalent of pg_isready.

When a data node calls node_active, the state of the node is stored in the pgautofailover.node table and
the state machines of both nodes are progressed. The state machines are described later in this readme.
The monitor typically only moves one state forward and waits for the node(s) to converge except in
failure states.

If a node is not communicating to the monitor, it will either cause a failover (if node is a primary),
disabling synchronous replication (if node is a secondary), or cause the state machine to pause until the
node comes back (other cases). In most cases, the latter is harmless, though in some cases it may cause
downtime to last longer, e.g. if a standby goes down during a failover.

To simplify operations, a node is only considered unhealthy if the monitor cannot connect and it hasn't
reported its state through node_active for a while. This allows, for example, PostgreSQL to be restarted
without causing a health check failure.

Synchronous vs. asynchronous replication
By default, pg_auto_failover uses synchronous replication, which means all writes block until at least
one standby node has reported receiving them. To handle cases in which the standby fails, the primary
switches between two states called wait_primary and primary based on the health of standby nodes, and
based on the replication setting number_sync_standby.

When in the wait_primary state, synchronous replication is disabled by automatically setting
synchronous_standby_names = '' to allow writes to proceed. However doing so also disables failover, since
the standby might get arbitrarily far behind. If the standby is responding to health checks and within 1
WAL segment of the primary (by default), synchronous replication is enabled again on the primary by
setting synchronous_standby_names = '*' which may cause a short latency spike since writes will then
block until the standby has caught up.

When using several standby nodes with replication quorum enabled, the actual setting for
synchronous_standby_names is set to a list of those standby nodes that are set to participate to the
replication quorum.

If you wish to disable synchronous replication, you need to add the following to postgresql.conf:

synchronous_commit = 'local'

This ensures that writes return as soon as they are committed on the primary -- under all circumstances.
In that case, failover might lead to some data loss, but failover is not initiated if the secondary is
more than 10 WAL segments (by default) behind on the primary. During a manual failover, the standby will
continue accepting writes from the old primary. The standby will stop accepting writes only if it's fully
caught up (most common), the primary fails, or it does not receive writes for 2 minutes.

A note about performance
In some cases the performance impact on write latency when setting synchronous replication makes the
application fail to deliver expected performance. If testing or production feedback shows this to be the
case, it is beneficial to switch to using asynchronous replication.

The way to use asynchronous replication in pg_auto_failover is to change the synchronous_commit setting.
This setting can be set per transaction, per session, or per user. It does not have to be set globally on
your Postgres instance.

One way to benefit from that would be:

alter role fast_and_loose set synchronous_commit to local;

That way performance-critical parts of the application don't have to wait for the standby nodes. Only use
this when you can also lower your data durability guarantees.

Node recovery
When bringing a node back after a failover, the keeper (pg_autoctl run) can simply be restarted. It will
also restart postgres if needed and obtain its goal state from the monitor. If the failed node was a
primary and was demoted, it will learn this from the monitor. Once the node reports, it is allowed to
come back as a standby by running pg_rewind. If it is too far behind, the node performs a new
pg_basebackup.

MULTI-NODE ARCHITECTURES

       Pg_auto_failover  allows  you  to  have more than one standby node, and offers advanced control over your
       production architecture characteristics.

   Architectures with two standby nodes
       When adding your second standby node with default settings, you get the following architecture:
         [image: pg_auto_failover architecture with two standby  nodes]  [image]  pg_auto_failover  architecture
         with two standby nodes.UNINDENT

         In  this case, three nodes get set up with the same characteristics, achieving HA for both the Postgres
         service and the production dataset. An important setting for this architecture is number_sync_standbys.

         The replication setting number_sync_standbys sets how many standby nodes the primary  should  wait  for
         when  committing  a  transaction. In order to have a good availability in your system, pg_auto_failover
         requires number_sync_standbys + 1 standby nodes participating in the replication  quorum:  this  allows
         any standby node to fail without impact on the system's ability to respect the replication quorum.

         When  only  two nodes are registered in a group on the monitor we have a primary and a single secondary
         node. Then number_sync_standbys can only be set to zero.  When  adding  a  second  standby  node  to  a
         pg_auto_failover  group,  then  the monitor automatically increments number_sync_standbys to one, as we
         see in the diagram above.

         When number_sync_standbys is set to zero then pg_auto_failover implements the Business Continuity setup
         as seen in architecture_basics: synchronous replication is  then  used  as  a  way  to  guarantee  that
         failover can be implemented without data loss.

         In more details:

          1. With number_sync_standbys set to one, this architecture always maintains two copies of the dataset:
             one  on  the  current  primary  node  (node A in the previous diagram), and one on the standby that
             acknowledges the transaction first (either node B or node C in the diagram).

             When one of the standby nodes is  unavailable,  the  second  copy  of  the  dataset  can  still  be
             maintained thanks to the remaining standby.

             When  both  the  standby  nodes  are  unavailable,  then  it's  no longer possible to guarantee the
             replication quorum, and thus writes on the primary are blocked. The  Postgres  primary  node  waits
             until  at  least  one  standby node acknowledges the transactions locally committed, thus degrading
             your Postgres service to read-only.

          0.  It is possible to manually set number_sync_standbys to zero when  having  registered  two  standby
              nodes to the monitor, overriding the default behavior.

              In that case, when the second standby node becomes unhealthy at the same time as the first standby
              node,  the primary node is assigned the state wait_primary. In that state, synchronous replication
              is disabled on the primary by setting synchronous_standby_names to an  empty  string.  Writes  are
              allowed  on  the primary, even though there's no extra copy of the production dataset available at
              this time.

              Setting number_sync_standbys to zero allows data to be written even when both  standby  nodes  are
              down.  In this case, a single copy of the production data set is kept and, if the primary was then
              to fail, some data will be lost. How much depends on your backup and recovery mechanisms.

   Replication Settings and Postgres Architectures
       The entire flexibility of  pg_auto_failover  can  be  leveraged  with  the  following  three  replication
       settings:

          • Number of sync stanbys

          • Replication quorum

          • Candidate priority

   Number Sync Standbys
       This parameter is used by Postgres in the synchronous_standby_names parameter: number_sync_standby is the
       number of synchronous standbys for whose replies transactions must wait.

       This  parameter  can  be  set  at the formation level in pg_auto_failover, meaning that it applies to the
       current primary, and "follows" a failover to apply to any new primary that might replace the current one.

       To set this parameter to the value <n>, use the following command:

          pg_autoctl set formation number-sync-standbys <n>

       The  default  value  in  pg_auto_failover  is  zero.  When  set   to   zero,   the   Postgres   parameter
       synchronous_standby_names can be set to either '*' or to '':

       • synchronous_standby_names  =  '*'  means that any standby may participate in the replication quorum for
         transactions with synchronous_commit set to on or higher values.

         pg_autofailover uses synchronous_standby_names = '*' when there's at least one standby that is known to
         be healthy.

       • synchronous_standby_names = '' (empty string) disables synchrous commit  and  makes  all  your  commits
         asynchronous,  meaning that transaction commits will not wait for replication. In other words, a single
         copy of your production data is maintained when synchronous_standby_names is set that way.

         pg_autofailover uses synchronous_standby_names = '' only when number_sync_standbys is set to  zero  and
         there's no standby node known healthy by the monitor.

       In  order  to  set  number_sync_standbys  to  a  non-zero  value, pg_auto_failover requires that at least
       number_sync_standbys + 1 standby nodes be registered in the system.

       When the first standby node is added to the pg_auto_failover  monitor,  the  only  acceptable  value  for
       number_sync_standbys is zero. When a second standby is added that participates in the replication quorum,
       then number_sync_standbys is automatically set to one.

       The  command  pg_autoctl  set  formation  number-sync-standbys  can  be  used to change the value of this
       parameter in a formation, even when all the nodes are already running in production. The pg_auto_failover
       monitor then sets a transition for the primary to update its local value of synchronous_standby_names.

   Replication Quorum
       The  replication  quorum  setting  is  a  boolean  and  defaults  to  true,  and  can  be  set  per-node.
       Pg_auto_failover  includes  a  given  node  in synchronous_standby_names only when the replication quorum
       parameter has been set to true. This means that asynchronous replication will be  used  for  nodes  where
       replication-quorum is set to false.

       It  is  possible to force asynchronous replication globally by setting replication quorum to false on all
       the nodes in a formation. Remember that failovers will happen, and thus to set your replication  settings
       on the current primary node too when needed: it is going to be a standby later.

       To set this parameter to either true or false, use one of the following commands:

          pg_autoctl set node replication-quorum true
          pg_autoctl set node replication-quorum false

   Candidate Priority
       The  candidate  priority setting is an integer that can be set to any value between 0 (zero) and 100 (one
       hundred). The default value is 50. When the pg_auto_failover monitor decides to orchestrate  a  failover,
       it uses each node's candidate priority to pick the new primary node.

       When  setting  the  candidate  priority  of  a  node down to zero, this node will never be selected to be
       promoted as the new primary when a failover is orchestrated by the monitor. The monitor will instead wait
       until another node registered is healthy and in a position to be promoted.

       To set this parameter to the value <n>, use the following command:

          pg_autoctl set node candidate-priority <n>

       When nodes have the same candidate priority, the monitor then picks the standby with  the  most  advanced
       LSN  position  published  to  the monitor. When more than one node has published the same LSN position, a
       random one is chosen.

       When the candidate  for  failover  has  not  published  the  most  advanced  LSN  position  in  the  WAL,
       pg_auto_failover  orchestrates  an intermediate step in the failover mechanism. The candidate fetches the
       missing WAL bytes from one of the standby with the most advanced LSN position prior  to  being  promoted.
       Postgres  allows this operation thanks to cascading replication: any standby can be the upstream node for
       another standby.

       It is required at all times  that  at  least  two  nodes  have  a  non-zero  candidate  priority  in  any
       pg_auto_failover formation. Otherwise no failover is possible.

   Auditing replication settings
       The  command  pg_autoctl  get  formation settings (also known as pg_autoctl show settings) can be used to
       obtain a summary of all the replication settings currently in effect in  a  formation.  Still  using  the
       first diagram on this page, we get the following summary:

          $ pg_autoctl get formation settings
            Context |    Name |                   Setting | Value
          ----------+---------+---------------------------+-------------------------------------------------------------
          formation | default |      number_sync_standbys | 1
            primary |  node_A | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_3, pgautofailover_standby_2)'
               node |  node_A |        replication quorum | true
               node |  node_B |        replication quorum | true
               node |  node_C |        replication quorum | true
               node |  node_A |        candidate priority | 50
               node |  node_B |        candidate priority | 50
               node |  node_C |        candidate priority | 50

       We  can  see  that  the  number_sync_standbys  has  been  used  to  compute  the  current  value  of  the
       synchronous_standby_names setting on the primary.

       Because  all  the  nodes  in  that  example  have  the  same  default  candidate  priority   (50),   then
       pg_auto_failover  is using the form ANY 1 with the list of standby nodes that are currently participating
       in the replication quorum.

       The entries in the synchronous_standby_names list are meant  to  match  the  application_name  connection
       setting  used in the primary_conninfo, and the format used by pg_auto_failover there is the format string
       "pgautofailover_standby_%d" where %d is replaced by the node id. This allows keeping the same  connection
       string to the primary when the node name is changed (using the command pg_autoctl set metadata --name).

       Here we can see the node id of each registered Postgres node with the following command:

          $ pg_autoctl show state
            Name |  Node |      Host:Port |       LSN | Reachable |       Current State |      Assigned State
          -------+-------+----------------+-----------+-----------+---------------------+--------------------
          node_A |     1 | localhost:5001 | 0/7002310 |       yes |             primary |             primary
          node_B |     2 | localhost:5002 | 0/7002310 |       yes |           secondary |           secondary
          node_C |     3 | localhost:5003 | 0/7002310 |       yes |           secondary |           secondary

       When setting pg_auto_failover with per formation number_sync_standby and then per node replication quorum
       and   candidate   priority   replication  settings,  those  properties  are  then  used  to  compute  the
       synchronous_standby_names value on the primary node.  This  value  is  automatically  maintained  on  the
       primary  by  pg_auto_failover,  and  is  updated  either  when replication settings are changed or when a
       failover happens.

       The other situation when the pg_auto_failover replication settings are used is a candidate election  when
       a failover happens and there is more than two nodes registered in a group. Then the node with the highest
       candidate priority is selected, as detailed above in the Candidate Priority section.

   Sample architectures with three standby nodes
       When  setting  the  three parameters above, it's possible to design very different Postgres architectures
       for your production needs.
         [image: pg_auto_failover architecture with three standby nodes] [image]  pg_auto_failover  architecture
         with three standby nodes.UNINDENT

         In  this  case,  the  system  is  set  up  with  three standby nodes all set the same way, with default
         parameters. The default parameters support setting number_sync_standbys = 2. This means  that  Postgres
         will maintain three copies of the production data set at all times.

         On the other hand, if two standby nodes were to fail at the same time, despite the fact that two copies
         of the data are still maintained, the Postgres service would be degraded to read-only.

         With this architecture diagram, here's the summary that we obtain:

          $ pg_autoctl show settings
            Context |    Name |                   Setting | Value
          ----------+---------+---------------------------+---------------------------------------------------------------------------------------
          formation | default |      number_sync_standbys | 2
            primary |  node_A | synchronous_standby_names | 'ANY 2 (pgautofailover_standby_2, pgautofailover_standby_4, pgautofailover_standby_3)'
               node |  node_A |        replication quorum | true
               node |  node_B |        replication quorum | true
               node |  node_C |        replication quorum | true
               node |  node_D |        replication quorum | true
               node |  node_A |        candidate priority | 50
               node |  node_B |        candidate priority | 50
               node |  node_C |        candidate priority | 50
               node |  node_D |        candidate priority | 50

   Sample architecture with three standby nodes, one async
         [image:  pg_auto_failover  architecture  with  three standby nodes, one async] [image] pg_auto_failover
         architecture with three standby nodes, one async.UNINDENT

         In this case, the system is set up with two standby nodes  participating  in  the  replication  quorum,
         allowing for number_sync_standbys = 1. The system always maintains at least two copies of the data set,
         one  on  the  primary,  another on either node B or node D. Whenever we lose one of those nodes, we can
         hold to the guarantee of having two copies of the data set.

         Additionally, we have the standby server C which has been set up to not participate in the  replication
         quorum. Node C will not be found in the synchronous_standby_names list of nodes. Also, node C is set up
         to never be a candidate for failover, with candidate-priority = 0.

         This  architecture would fit a situation with nodes A, B, and D are deployed in the same data center or
         availability zone and node C in another one.  Those  three  nodes  are  set  up  to  support  the  main
         production traffic and implement high availability of both the Postgres service and the data set.

         Node  C  might  be  set  up for Business Continuity in case the first data center is lost, or maybe for
         reporting needs on another application domain.

         With this architecture diagram, here's the summary that we obtain:

          pg_autoctl show settings
            Context |    Name |                   Setting | Value
          ----------+---------+---------------------------+-------------------------------------------------------------
          formation | default |      number_sync_standbys | 1
            primary |  node_A | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_4, pgautofailover_standby_2)'
               node |  node_A |        replication quorum | true
               node |  node_B |        replication quorum | true
               node |  node_C |        replication quorum | false
               node |  node_D |        replication quorum | true
               node |  node_A |        candidate priority | 50
               node |  node_B |        candidate priority | 50
               node |  node_C |        candidate priority | 0
               node |  node_D |        candidate priority | 50

FAILOVER STATE MACHINE

Introduction
pg_auto_failover uses a state machine for highly controlled execution. As keepers inform the monitor
about new events (or fail to contact it at all), the monitor assigns each node both a current state and a
goal state. A node's current state is a strong guarantee of its capabilities. States themselves do not
cause any actions; actions happen during state transitions. The assigned goal states inform keepers of
what transitions to attempt.

Example of state transitions in a new cluster
A good way to get acquainted with the states is by examining the transitions of a cluster from birth to
high availability.

After starting a monitor and running keeper init for the first data node ("node A"), the monitor
registers the state of that node as "init" with a goal state of "single." The init state means the
monitor knows nothing about the node other than its existence because the keeper is not yet continuously
running there to report node health.

Once the keeper runs and reports its health to the monitor, the monitor assigns it the state "single,"
meaning it is just an ordinary Postgres server with no failover. Because there are not yet other nodes in
the cluster, the monitor also assigns node A the goal state of single -- there's nothing that node A's
keeper needs to change.

As soon as a new node ("node B") is initialized, the monitor assigns node A the goal state of
"wait_primary." This means the node still has no failover, but there's hope for a secondary to
synchronize with it soon. To accomplish the transition from single to wait_primary, node A's keeper adds
node B's hostname to pg_hba.conf to allow a hot standby replication connection.

At the same time, node B transitions into wait_standby with the goal initially of staying in
wait_standby. It can do nothing but wait until node A gives it access to connect. Once node A has
transitioned to wait_primary, the monitor assigns B the goal of "catchingup," which gives B's keeper the
green light to make the transition from wait_standby to catchingup. This transition involves running
pg_basebackup, editing recovery.conf and restarting PostgreSQL in Hot Standby node.

Node B reports to the monitor when it's in hot standby mode and able to connect to node A. The monitor
then assigns node B the goal state of "secondary" and A the goal of "primary." Postgres ships WAL logs
from node A and replays them on B. Finally B is caught up and tells the monitor (specifically B reports
its pg_stat_replication.sync_state and WAL replay lag). At this glorious moment the monitor assigns A the
state primary (goal: primary) and B secondary (goal: secondary).

State reference
The following diagram shows the pg_auto_failover State Machine. It's missing links to the single state,
which can always been reached when removing all the other nodes.
[image: pg_auto_failover Finite State Machine diagram] [image] pg_auto_failover Finite State Machine
diagram.UNINDENT

In the previous diagram we can see that we have a list of six states where the application can connect
to a read-write Postgres service: single, wait_primary, primary, prepare_maintenance, and
apply_settings.

Init
A node is assigned the "init" state when it is first registered with the monitor. Nothing is known about
the node at this point beyond its existence. If no other node has been registered with the monitor for
the same formation and group ID then this node is assigned a goal state of "single." Otherwise the node
has the goal state of "wait_standby."

Single
There is only one node in the group. It behaves as a regular PostgreSQL instance, with no high
availability and no failover. If the administrator removes a node the other node will revert to the
single state.

Wait_primary
Applied to a node intended to be the primary but not yet in that position. The primary-to-be at this
point knows the secondary's node name or IP address, and has granted the node hot standby access in the
pg_hba.conf file.

The wait_primary state may be caused either by a new potential secondary being registered with the
monitor (good), or an existing secondary becoming unhealthy (bad). In the latter case, during the
transition from primary to wait_primary, the primary node's keeper disables synchronous replication on
the node. It also cancels currently blocked queries.

Join_primary
Applied to a primary node when another standby is joining the group. This allows the primary node to
apply necessary changes to its HBA setup before allowing the new node joining the system to run the
pg_basebackup command.

IMPORTANT:
This state has been deprecated, and is no longer assigned to nodes. Any time we would have used
join_primary before, we now use primary instead.

Primary
A healthy secondary node exists and has caught up with WAL replication. Specifically, the keeper reports
the primary state only when it has verified that the secondary is reported "sync" in
pg_stat_replication.sync_state, and with a WAL lag of 0.

The primary state is a strong assurance. It's the only state where we know we can fail over when
required.

During the transition from wait_primary to primary, the keeper also enables synchronous replication. This
means that after a failover the secondary will be fully up to date.

Wait_standby
Monitor decides this node is a standby. Node must wait until the primary has authorized it to connect and
setup hot standby replication.

Catchingup
The monitor assigns catchingup to the standby node when the primary is ready for a replication connection
(pg_hba.conf has been properly edited, connection role added, etc).

The standby node keeper runs pg_basebackup, connecting to the primary's hostname and port. The keeper
then edits recovery.conf and starts PostgreSQL in hot standby node.

Secondary
A node with this state is acting as a hot standby for the primary, and is up to date with the WAL log
there. In particular, it is within 16MB or 1 WAL segment of the primary.

Maintenance
The cluster administrator can manually move a secondary into the maintenance state to gracefully take it
offline. The primary will then transition from state primary to wait_primary, during which time the
secondary will be online to accept writes. When the old primary reaches the wait_primary state then the
secondary is safe to take offline with minimal consequences.

Prepare_maintenance
The cluster administrator can manually move a primary node into the maintenance state to gracefully take
it offline. The primary then transitions to the prepare_maintenance state to make sure the secondary is
not missing any writes. In the prepare_maintenance state, the primary shuts down.

Wait_maintenance
The custer administrator can manually move a secondary into the maintenance state to gracefully take it
offline. Before reaching the maintenance state though, we want to switch the primary node to asynchronous
replication, in order to avoid writes being blocked. In the state wait_maintenance the standby waits
until the primary has reached wait_primary.

Draining
A state between primary and demoted where replication buffers finish flushing. A draining node will not
accept new client writes, but will continue to send existing data to the secondary.

To implement that with Postgres we actually stop the service. When stopping, Postgres ensures that the
current replication buffers are flushed correctly to synchronous standbys.

Demoted
The primary keeper or its database were unresponsive past a certain threshold. The monitor assigns
demoted state to the primary to avoid a split-brain scenario where there might be two nodes that don't
communicate with each other and both accept client writes.

In that state the keeper stops PostgreSQL and prevents it from running.

Demote_timeout
If the monitor assigns the primary a demoted goal state but the primary keeper doesn't acknowledge
transitioning to that state within a timeout window, then the monitor assigns demote_timeout to the
primary.

Most commonly may happen when the primary machine goes silent. The keeper is not reporting to the
monitor.

Stop_replication
The stop_replication state is meant to ensure that the primary goes to the demoted state before the
standby goes to single and accepts writes (in case the primary can’t contact the monitor anymore). Before
promoting the secondary node, the keeper stops PostgreSQL on the primary to avoid split-brain situations.

For safety, when the primary fails to contact the monitor and fails to see the pg_auto_failover
connection in pg_stat_replication, then it goes to the demoted state of its own accord.

Prepare_promotion
The prepare_promotion state is meant to prepare the standby server to being promoted. This state allows
synchronisation on the monitor, making sure that the primary has stopped Postgres before promoting the
secondary, hence preventing split brain situations.

Report_LSN
The report_lsn state is assigned to standby nodes when a failover is orchestrated and there are several
standby nodes. In order to pick the furthest standby in the replication, pg_auto_failover first needs a
fresh report of the current LSN position reached on each standby node.

When a node reaches the report_lsn state, the replication stream is stopped, by restarting Postgres
without a primary_conninfo. This allows the primary node to detect network_partitions, i.e. when the
primary can't connect to the monitor and there's no standby listed in pg_stat_replication.

Fast_forward
The fast_forward state is assigned to the selected promotion candidate during a failover when it won the
election thanks to the candidate priority settings, but the selected node is not the most advanced
standby node as reported in the report_lsn state.

Missing WAL bytes are fetched from one of the most advanced standby nodes by using Postgres cascading
replication features: it is possible to use any standby node in the primary_conninfo.

Dropped
The dropped state is assigned to a node when the pg_autoctl drop node command is used. This allows the
node to implement specific local actions before being entirely removed from the monitor database.

When a node reports reaching the dropped state, the monitor removes its entry. If a node is not reporting
anymore, maybe because it's completely unavailable, then it's possible to run the pg_autoctl drop node
--force command, and then the node entry is removed from the monitor.

Failover logic
This section needs to be expanded further, but below is the failover state machine for each node that is
implemented by the monitor:
[image: Node state machine] [image] Node state machine.UNINDENT

Since the state machines of the data nodes always move in tandem, a pair (group) of data nodes also
implicitly has the following state machine:
[image: Group state machine] [image] Group state machine.UNINDENT

pg_auto_failover keeper's State Machine
When built in TEST mode, it is then possible to use the following command to get a visual representation
of the Keeper's Finite State Machine:

$ PG_AUTOCTL_DEBUG=1 pg_autoctl do fsm gv | dot -Tsvg > fsm.svg

The dot program is part of the Graphviz suite and produces the following output:
[image: Keeper state machine] [image] Keeper State Machine.UNINDENT

FAILOVER AND FAULT TOLERANCE

At the heart of the pg_auto_failover implementation is a State Machine. The state machine is driven by
the monitor, and its transitions are implemented in the keeper service, which then reports success to the
monitor.

The keeper is allowed to retry transitions as many times as needed until they succeed, and reports also
failures to reach the assigned state to the monitor node. The monitor also implements frequent
health-checks targeting the registered PostgreSQL nodes.

When the monitor detects something is not as expected, it takes action by assigning a new goal state to
the keeper, that is responsible for implementing the transition to this new state, and then reporting.

Unhealthy Nodes
The pg_auto_failover monitor is responsible for running regular health-checks with every PostgreSQL node
it manages. A health-check is successful when it is able to connect to the PostgreSQL node using the
PostgreSQL protocol (libpq), imitating the pg_isready command.

How frequent those health checks are (20s by default), the PostgreSQL connection timeout in use (5s by
default), and how many times to retry in case of a failure before marking the node unhealthy (2 by
default) are GUC variables that you can set on the Monitor node itself. Remember, the monitor is
implemented as a PostgreSQL extension, so the setup is a set of PostgreSQL configuration settings:

SELECT name, setting
FROM pg_settings
WHERE name ~ 'pgautofailover\.health';
name | setting
-----------------------------------------+---------
pgautofailover.health_check_max_retries | 2
pgautofailover.health_check_period | 20000
pgautofailover.health_check_retry_delay | 2000
pgautofailover.health_check_timeout | 5000
(4 rows)

The pg_auto_failover keeper also reports if PostgreSQL is running as expected. This is useful for
situations where the PostgreSQL server / OS is running fine and the keeper (pg_autoctl run) is still
active, but PostgreSQL has failed. Situations might include File System is Full on the WAL disk, some
file system level corruption, missing files, etc.

Here's what happens to your PostgreSQL service in case of any single-node failure is observed:

• Primary node is monitored unhealthy

When the primary node is unhealthy, and only when the secondary node is itself in good health, then
the primary node is asked to transition to the DRAINING state, and the attached secondary is asked
to transition to the state PREPARE_PROMOTION. In this state, the secondary is asked to catch-up with
the WAL traffic from the primary, and then report success.

The monitor then continues orchestrating the promotion of the standby: it stops the primary
(implementing STONITH in order to prevent any data loss), and promotes the secondary into being a
primary now.

Depending on the exact situation that triggered the primary unhealthy, it's possible that the
secondary fails to catch-up with WAL from it, in that case after the
PREPARE_PROMOTION_CATCHUP_TIMEOUT the standby reports success anyway, and the failover sequence
continues from the monitor.

• Secondary node is monitored unhealthy

When the secondary node is unhealthy, the monitor assigns to it the state CATCHINGUP, and assigns
the state WAIT_PRIMARY to the primary node. When implementing the transition from PRIMARY to
WAIT_PRIMARY, the keeper disables synchronous replication.

When the keeper reports an acceptable WAL difference in the two nodes again, then the replication is
upgraded back to being synchronous. While a secondary node is not in the SECONDARY state, secondary
promotion is disabled.

• Monitor node has failed

Then the primary and secondary node just work as if you didn't have setup pg_auto_failover in the
first place, as the keeper fails to report local state from the nodes. Also, health checks are not
performed. It means that no automated failover may happen, even if needed.

Network Partitions
Adding to those simple situations, pg_auto_failover is also resilient to Network Partitions. Here's the
list of situation that have an impact to pg_auto_failover behavior, and the actions taken to ensure High
Availability of your PostgreSQL service:

• Primary can't connect to Monitor

Then it could be that either the primary is alone on its side of a network split, or that the
monitor has failed. The keeper decides depending on whether the secondary node is still connected to
the replication slot, and if we have a secondary, continues to serve PostgreSQL queries.

Otherwise, when the secondary isn't connected, and after the NETWORK_PARTITION_TIMEOUT has elapsed,
the primary considers it might be alone in a network partition: that's a potential split brain
situation and with only one way to prevent it. The primary stops, and reports a new state of
DEMOTE_TIMEOUT.

The network_partition_timeout can be setup in the keeper's configuration and defaults to 20s.

• Monitor can't connect to Primary

Once all the retries have been done and the timeouts are elapsed, then the primary node is
considered unhealthy, and the monitor begins the failover routine. This routine has several steps,
each of them allows to control our expectations and step back if needed.

For the failover to happen, the secondary node needs to be healthy and caught-up with the primary.
Only if we timeout while waiting for the WAL delta to resorb (30s by default) then the secondary can
be promoted with uncertainty about the data durability in the group.

• Monitor can't connect to Secondary

As soon as the secondary is considered unhealthy then the monitor changes the replication setting to
asynchronous on the primary, by assigning it the WAIT_PRIMARY state. Also the secondary is assigned
the state CATCHINGUP, which means it can't be promoted in case of primary failure.

As the monitor tracks the WAL delta between the two servers, and they both report it independently,
the standby is eligible to promotion again as soon as it's caught-up with the primary again, and at
this time it is assigned the SECONDARY state, and the replication will be switched back to
synchronous.

Failure handling and network partition detection
If a node cannot communicate to the monitor, either because the monitor is down or because there is a
problem with the network, it will simply remain in the same state until the monitor comes back.

If there is a network partition, it might be that the monitor and secondary can still communicate and the
monitor decides to promote the secondary since the primary is no longer responsive. Meanwhile, the
primary is still up-and-running on the other side of the network partition. If a primary cannot
communicate to the monitor it starts checking whether the secondary is still connected. In PostgreSQL,
the secondary connection automatically times out after 30 seconds. If last contact with the monitor and
the last time a connection from the secondary was observed are both more than 30 seconds in the past, the
primary concludes it is on the losing side of a network partition and shuts itself down. It may be that
the secondary and the monitor were actually down and the primary was the only node that was alive, but we
currently do not have a way to distinguish such a situation. As with consensus algorithms, availability
can only be correctly preserved if at least 2 out of 3 nodes are up.

In asymmetric network partitions, the primary might still be able to talk to the secondary, while unable
to talk to the monitor. During failover, the monitor therefore assigns the secondary the stop_replication
state, which will cause it to disconnect from the primary. After that, the primary is expected to shut
down after at least 30 and at most 60 seconds. To factor in worst-case scenarios, the monitor waits for
90 seconds before promoting the secondary to become the new primary.

INSTALLING PG_AUTO_FAILOVER

       We provide native system packages for pg_auto_failover on most popular Linux distributions.

       Use the steps below to install pg_auto_failover on PostgreSQL 11. At the current time pg_auto_failover is
       compatible with both PostgreSQL 10 and PostgreSQL 11.

   Ubuntu or Debian
   Quick install
       The  following  installation method downloads a bash script that automates several steps. The full script
       is available for review at our package cloud installation instructions page.

          # add the required packages to your system
          curl https://install.citusdata.com/community/deb.sh | sudo bash

          # install pg_auto_failover
          sudo apt-get install postgresql-11-auto-failover

          # confirm installation
          /usr/bin/pg_autoctl --version

   Manual Installation
       If you'd prefer to install your repo on your system manually, follow the instructions from package  cloud
       manual installation page. This page will guide you with the specific details to achieve the 3 steps:

          1. install CitusData GnuPG key for its package repository

          2. install a new apt source for CitusData packages

          3. update your available package list

       Then when that's done, you can proceed with installing pg_auto_failover itself as in the previous case:

          # install pg_auto_failover
          sudo apt-get install postgresql-11-auto-failover

          # confirm installation
          /usr/bin/pg_autoctl --version

   Fedora, CentOS, or Red Hat
   Quick install
       The  following  installation method downloads a bash script that automates several steps. The full script
       is available for review at our package cloud installation instructions page url.

          # add the required packages to your system
          curl https://install.citusdata.com/community/rpm.sh | sudo bash

          # install pg_auto_failover
          sudo yum install -y pg-auto-failover14_12

          # confirm installation
          /usr/pgsql-12/bin/pg_autoctl --version

   Manual installation
       If you'd prefer to install your repo on your system manually, follow the instructions from package  cloud
       manual installation page. This page will guide you with the specific details to achieve the 3 steps:

          1. install the pygpgme yum-utils packages for your distribution

          2. install a new RPM reposiroty for CitusData packages

          3. update your local yum cache

       Then when that's done, you can proceed with installing pg_auto_failover itself as in the previous case:

          # install pg_auto_failover
          sudo yum install -y pg-auto-failover14_12

          # confirm installation
          /usr/pgsql-12/bin/pg_autoctl --version

   Installing a pgautofailover Systemd unit
       The  command  pg_autoctl  show  systemd outputs a systemd unit file that you can use to setup a boot-time
       registered service for pg_auto_failover on your machine.

       Here's a sample output from the command:

          $ export PGDATA=/var/lib/postgresql/monitor
          $ pg_autoctl show systemd
          13:44:34 INFO  HINT: to complete a systemd integration, run the following commands:
          13:44:34 INFO  pg_autoctl -q show systemd --pgdata "/var/lib/postgresql/monitor" | sudo tee /etc/systemd/system/pgautofailover.service
          13:44:34 INFO  sudo systemctl daemon-reload
          13:44:34 INFO  sudo systemctl start pgautofailover
          [Unit]
          Description = pg_auto_failover

          [Service]
          WorkingDirectory = /var/lib/postgresql
          Environment = 'PGDATA=/var/lib/postgresql/monitor'
          User = postgres
          ExecStart = /usr/lib/postgresql/10/bin/pg_autoctl run
          Restart = always
          StartLimitBurst = 0

          [Install]
          WantedBy = multi-user.target

       Copy/pasting the commands given in the hint output from the command will enable the pgautofailer  service
       on your system, when using systemd.

       It  is  important  that PostgreSQL is started by pg_autoctl rather than by systemd itself, as it might be
       that a failover has been done during a reboot, for instance, and that once the reboot  complete  we  want
       the local Postgres to re-join as a secondary node where it used to be a primary node.

SECURITY SETTINGS FOR PG_AUTO_FAILOVER

       In  order  to  be  able  to  orchestrate  fully automated failovers, pg_auto_failover needs to be able to
       establish the following Postgres connections:

          • from the monitor node to each Postgres node to check the node's “health”

          • from each Postgres node to the monitor to implement our node_active protocol and fetch  the  current
            assigned state for this node

          • from the secondary node to the primary node for Postgres streaming replication.

       Postgres  Client  authentication is controlled by a configuration file: pg_hba.conf. This file contains a
       list of rules where each rule may allow or reject a connection attempt.

       For pg_auto_failover to work as intended, some HBA rules need to be added to each node configuration. You
       can choose to provision the pg_hba.conf file yourself thanks to pg_autoctl options' --skip-pg-hba, or you
       can use the following options to control which kind of rules are going to be added for you.

   Postgres HBA rules
       For your application to be able to connect to the current  Postgres  primary  servers,  some  application
       specific  HBA  rules  have  to  be  added  to  pg_hba.conf.  There  is  no  provision  for  doing that in
       pg_auto_failover.

       In other words, it is expected that you have to edit pg_hba.conf to open connections for your application
       needs.

   The trust security model
       As its name suggests the trust security model is not enabling  any  kind  of  security  validation.  This
       setting  is  popular  for  testing deployments though, as it makes it very easy to verify that everything
       works as intended before putting security restrictions in place.

       To enable a “trust” security model with pg_auto_failover, use the pg_autoctl  option  --auth  trust  when
       creating nodes:

          $ pg_autoctl create monitor --auth trust ...
          $ pg_autoctl create postgres --auth trust ...
          $ pg_autoctl create postgres --auth trust ...

       When  using  --auth  trust  pg_autoctl adds new HBA rules in the monitor and the Postgres nodes to enable
       connections as seen above.

   Authentication with passwords
       To setup pg_auto_failover with  password  for  connections,  you  can  use  one  of  the  password  based
       authentication methods supported by Postgres, such as password or scram-sha-256. We recommend the latter,
       as in the following example:

          $ pg_autoctl create monitor --auth scram-sha-256 ...

       The  pg_autoctl does not set the password for you. The first step is to set the database user password in
       the monitor database thanks to the following command:

          $ psql postgres://monitor.host/pg_auto_failover
          > alter user autoctl_node password 'h4ckm3';

       Now that the monitor is ready with our password set for the autoctl_node user, we can use the password in
       the monitor connection string used when creating Postgres nodes.

       On the primary node, we can create the Postgres setup as usual, and then set  our  replication  password,
       that we will use if we are demoted and then re-join as a standby:

          $ pg_autoctl create postgres       \
                 --auth scram-sha-256        \
                 ...                         \
                 --monitor postgres://autoctl_node:h4ckm3@monitor.host/pg_auto_failover

          $ pg_autoctl config set replication.password h4ckm3m0r3

       The  second  Postgres  node  is  going  to  be  initialized  as  a  secondary  and  pg_autoctl then calls
       pg_basebackup at create time. We need to have the replication password already set at this time,  and  we
       can achieve that the following way:

          $ export PGPASSWORD=h4ckm3m0r3
          $ pg_autoctl create postgres       \
                 --auth scram-sha-256        \
                 ...                         \
                 --monitor postgres://autoctl_node:h4ckm3@monitor.host/pg_auto_failover

          $ pg_autoctl config set replication.password h4ckm3m0r3

       Note  that you can use The Password File mechanism as discussed in the Postgres documentation in order to
       maintain your passwords in a separate file, not in your main pg_auto_failover  configuration  file.  This
       also avoids using passwords in the environment and in command lines.

   Encryption of network communications
       Postgres   knows  how  to  use  SSL  to  enable  network  encryption  of  all  communications,  including
       authentication with passwords and the whole data set when streaming replication is used.

       To enable SSL on the server an SSL certificate is  needed.  It  could  be  as  simple  as  a  self-signed
       certificate,  and pg_autoctl creates such a certificate for you when using --ssl-self-signed command line
       option:

          $ pg_autoctl create monitor --ssl-self-signed ...      \
                                      --auth scram-sha-256 ...   \
                                      --ssl-mode require         \
                                      ...

          $ pg_autoctl create postgres --ssl-self-signed ...      \
                                       --auth scram-sha-256 ...   \
                                       ...

          $ pg_autoctl create postgres --ssl-self-signed ...      \
                                       --auth scram-sha-256 ...   \
                                       ...

       In that example we setup SSL connections to encrypt the network traffic, and we still have  to  setup  an
       authentication  mechanism  exactly  as  in the previous sections of this document. Here scram-sha-256 has
       been selected, and the password will be sent over an encrypted channel.

       When using the --ssl-self-signed option,  pg_autoctl  creates  a  self-signed  certificate,  as  per  the
       Postgres documentation at the Creating Certificates page.

       The  certificate  subject  CN  defaults  to  the  --hostname parameter, which can be given explicitely or
       computed by pg_autoctl as either your hostname when you have proper DNS resolution, or  your  current  IP
       address.

       Self-signed  certificates  provide  protection against eavesdropping; this setup does NOT protect against
       Man-In-The-Middle attacks nor Impersonation attacks. See PostgreSQL documentation page  SSL  Support  for
       details.

   Using your own SSL certificates
       In many cases you will want to install certificates provided by your local security department and signed
       by  a  trusted  Certificate Authority. In that case one solution is to use --skip-pg-hba and do the whole
       setup yourself.

       It is still possible to give the certificates to pg_auto_failover and have it handle the  Postgres  setup
       for you:

          $ pg_autoctl create monitor --ssl-ca-file root.crt   \
                                      --ssl-crl-file root.crl  \
                                      --server-cert server.crt  \
                                      --server-key server.key  \
                                      --ssl-mode verify-full \
                                      ...

          $ pg_autoctl create postgres --ssl-ca-file root.crt   \
                                       --server-cert server.crt  \
                                       --server-key server.key  \
                                       --ssl-mode verify-full \
                                       ...

          $ pg_autoctl create postgres --ssl-ca-file root.crt   \
                                       --server-cert server.crt  \
                                       --server-key server.key  \
                                       --ssl-mode verify-full \
                                       ...

       The option --ssl-mode can be used to force connection strings used by pg_autoctl to contain your prefered
       ssl  mode. It defaults to require when using --ssl-self-signed and to allow when --no-ssl is used.  Here,
       we set --ssl-mode to verify-full which requires SSL Certificates Authentication, covered next.

       The default --ssl-mode when providing  your  own  certificates  (signed  by  your  trusted  CA)  is  then
       verify-full. This setup applies to the client connection where the server identity is going to be checked
       against the root certificate provided with --ssl-ca-file and the revocation list optionally provided with
       the  --ssl-crl-file.  Both  those  files  are used as the respective parameters sslrootcert and sslcrl in
       pg_autoctl connection strings to both the monitor and the streaming replication primary server.

   SSL Certificates Authentication
       Given those files, it is then possible to use certificate based authentication of client connections. For
       that, it is necessary to prepare client certificates signed by your  root  certificate  private  key  and
       using the target user name as its CN, as per Postgres documentation for Certificate Authentication:
          The  cn  (Common  Name)  attribute  of the certificate will be compared to the requested database user
          name, and if they match the login will be allowed

       For enabling the cert  authentication  method  with  pg_auto_failover,  you  need  to  prepare  a  Client
       Certificate  for  the  user  postgres  and used by pg_autoctl when connecting to the monitor, to place in
       ~/.postgresql/postgresql.crt along with its key ~/.postgresql/postgresql.key, in the  home  directory  of
       the user that runs the pg_autoctl service (which defaults to postgres).

       Then  you  need  to  create  a  user  name map as documented in Postgres page User Name Maps so that your
       certificate can be used to authenticate pg_autoctl users.

       The ident map in pg_ident.conf on the pg_auto_failover monitor should then have the following  entry,  to
       allow postgres to connect as the autoctl_node user for pg_autoctl operations:

          # MAPNAME       SYSTEM-USERNAME         PG-USERNAME

          # pg_autoctl runs as postgres and connects to the monitor autoctl_node user
          pgautofailover   postgres               autoctl_node

       To  enable  streaming  replication,  the  pg_ident.conf  file  on each Postgres node should now allow the
       postgres user in the client certificate to connect as the pgautofailover_replicator database user:

          # MAPNAME       SYSTEM-USERNAME         PG-USERNAME

          # pg_autoctl runs as postgres and connects to the monitor autoctl_node user
          pgautofailover  postgres                pgautofailover_replicator

       Given that user name map, you can then use the cert authentication  method.  As  with  the  pg_ident.conf
       provisioning, it is best to now provision the HBA rules yourself, using the --skip-pg-hba option:

          $ pg_autoctl create postgres --skip-pg-hba --ssl-ca-file ...

       The  HBA  rule  will  use  the authentication method cert with a map option, and might then look like the
       following on the monitor:

          # allow certificate based authentication to the monitor
          hostssl pg_auto_failover autoctl_node 10.0.0.0/8 cert map=pgautofailover

       Then your pg_auto_failover nodes on the 10.0.0.0 network are allowed to connect to the monitor  with  the
       user autoctl_node used by pg_autoctl, assuming they have a valid and trusted client certificate.

       The  HBA  rule to use on the Postgres nodes to allow for Postgres streaming replication connections looks
       like the following:

          # allow streaming replication for pg_auto_failover nodes
          hostssl replication pgautofailover_replicator 10.0.0.0/8 cert map=pgautofailover

       Because the Postgres server runs as the postgres system user, the connection to the primary node  can  be
       made  with SSL enabled and will then use the client certificates installed in the postgres home directory
       in ~/.postgresql/postgresql.{key,cert} locations.

   Postgres HBA provisioning
       While pg_auto_failover knows how to manage the Postgres HBA rules that  are  necessary  for  your  stream
       replication needs and for its monitor protocol, it will not manage the Postgres HBA rules that are needed
       for your applications.

       If you have your own HBA provisioning solution, you can include the rules needed for pg_auto_failover and
       then use the --skip-pg-hba option to the pg_autoctl create commands.

   Enable SSL connections on an existing setup
       Whether  you  upgrade  pg_auto_failover  from  a  previous  version that did not have support for the SSL
       features, or  when  you  started  with  --no-ssl  and  later  change  your  mind,  it  is  possible  with
       pg_auto_failover to add SSL settings on system that has already been setup without explicit SSL support.

       In this section we detail how to upgrade to SSL settings.

       Installing Self-Signed certificates on-top of an already existing pg_auto_failover setup is done with one
       of  the  following  pg_autoctl  command variants, depending if you want self-signed certificates or fully
       verified ssl certificates:

          $ pg_autoctl enable ssl --ssl-self-signed --ssl-mode required

          $ pg_autoctl enable ssl --ssl-ca-file root.crt   \
                                  --ssl-crl-file root.crl  \
                                  --server-cert server.crt  \
                                  --server-key server.key  \
                                  --ssl-mode verify-full

       The pg_autoctl enable ssl command edits the postgresql-auto-failover.conf Postgres configuration file  to
       match  the  command  line  arguments  given and enable SSL as instructed, and then updates the pg_autoctl
       configuration.

       The connection string to connect to the monitor is also automatically updated by  the  pg_autoctl  enable
       ssl command. You can verify your new configuration with:

          $ pg_autoctl config get pg_autoctl.monitor

       Note  that  an  already  running  pg_autoctl deamon will try to reload its configuration after pg_autoctl
       enable ssl has finished. In some cases this is not possible to do without a restart. So be sure to  check
       the  logs  from  a  running  daemon  to  confirm that the reload succeeded. If it did not you may need to
       restart the daemon to ensure the new connection string is used.

       The HBA settings are not edited, irrespective of the --skip-pg-hba that has been used at  creation  time.
       That's  because the host records match either SSL or non-SSL connection attempts in Postgres HBA file, so
       the pre-existing setup will continue to work. To enhance the SSL setup, you can  manually  edit  the  HBA
       files  and  change  the  existing  lines from host to hostssl to dissallow unencrypted connections at the
       server side.

       In summary, to upgrade an existing pg_auto_failover setup to enable SSL:

          1. run the pg_autoctl enable ssl command on your monitor and then all the Postgres nodes,

          2. on the Postgres nodes, review your pg_autoctl logs to make sure that the reload operation has  been
             effective, and review your Postgres settings to verify that you have the expected result,

          3. review  your  HBA rules setup to change the pg_auto_failover rules from host to hostssl to disallow
             insecure connections.

MANUAL PAGES

       The pg_autoctl tool hosts many commands and sub-commands. Each of them have their own manual page.

   pg_autoctl
       pg_autoctl - control a pg_auto_failover node

   Synopsis
       pg_autoctl provides the following commands:

          + create   Create a pg_auto_failover node, or formation
          + drop     Drop a pg_auto_failover node, or formation
          + config   Manages the pg_autoctl configuration
          + show     Show pg_auto_failover information
          + enable   Enable a feature on a formation
          + disable  Disable a feature on a formation
          + get      Get a pg_auto_failover node, or formation setting
          + set      Set a pg_auto_failover node, or formation setting
          + perform  Perform an action orchestrated by the monitor
            run      Run the pg_autoctl service (monitor or keeper)
            watch    Display a dashboard to watch monitor's events and state
            stop     signal the pg_autoctl service for it to stop
            reload   signal the pg_autoctl for it to reload its configuration
            status   Display the current status of the pg_autoctl service
            help     print help message
            version  print pg_autoctl version

          pg_autoctl create
            monitor    Initialize a pg_auto_failover monitor node
            postgres   Initialize a pg_auto_failover standalone postgres node
            formation  Create a new formation on the pg_auto_failover monitor

          pg_autoctl drop
            monitor    Drop the pg_auto_failover monitor
            node       Drop a node from the pg_auto_failover monitor
            formation  Drop a formation on the pg_auto_failover monitor

          pg_autoctl config
            check  Check pg_autoctl configuration
            get    Get the value of a given pg_autoctl configuration variable
            set    Set the value of a given pg_autoctl configuration variable

          pg_autoctl show
            uri            Show the postgres uri to use to connect to pg_auto_failover nodes
            events         Prints monitor's state of nodes in a given formation and group
            state          Prints monitor's state of nodes in a given formation and group
            settings       Print replication settings for a formation from the monitor
            standby-names  Prints synchronous_standby_names for a given group
            file           List pg_autoctl internal files (config, state, pid)
            systemd        Print systemd service file for this node

          pg_autoctl enable
            secondary    Enable secondary nodes on a formation
            maintenance  Enable Postgres maintenance mode on this node
            ssl          Enable SSL configuration on this node

          pg_autoctl disable
            secondary    Disable secondary nodes on a formation
            maintenance  Disable Postgres maintenance mode on this node
            ssl          Disable SSL configuration on this node

          pg_autoctl get
          + node       get a node property from the pg_auto_failover monitor
          + formation  get a formation property from the pg_auto_failover monitor

          pg_autoctl get node
            replication-quorum  get replication-quorum property from the monitor
            candidate-priority  get candidate property from the monitor

          pg_autoctl get formation
            settings              get replication settings for a formation from the monitor
            number-sync-standbys  get number_sync_standbys for a formation from the monitor

          pg_autoctl set
          + node       set a node property on the monitor
          + formation  set a formation property on the monitor

          pg_autoctl set node
            metadata            set metadata on the monitor
            replication-quorum  set replication-quorum property on the monitor
            candidate-priority  set candidate property on the monitor

          pg_autoctl set formation
            number-sync-standbys  set number-sync-standbys for a formation on the monitor

          pg_autoctl perform
            failover    Perform a failover for given formation and group
            switchover  Perform a switchover for given formation and group
            promotion   Perform a failover that promotes a target node

   Description
       The pg_autoctl tool is the client tool provided by pg_auto_failover to create and manage  Postgres  nodes
       and  the  pg_auto_failover monitor node. The command is built with many sub-commands that each have their
       own manual page.

   Help
       To get the full recursive list of supported commands, use:

          pg_autoctl help

   Version
       To grab the version of pg_autoctl that you're using, use:

          pg_autoctl --version
          pg_autoctl version

       A typical output would be:

          pg_autoctl version 1.4.2
          pg_autoctl extension version 1.4
          compiled with PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
          compatible with Postgres 10, 11, 12, and 13

       The version is also available as a JSON document when using the --json option:

          pg_autoctl --version --json
          pg_autoctl version --json

       A typical JSON output would be:

          {
              "pg_autoctl": "1.4.2",
              "pgautofailover": "1.4",
              "pg_major": "12",
              "pg_version": "12.3",
              "pg_version_str": "PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit",
              "pg_version_num": 120003
          }

       This is for version 1.4.2 of pg_auto_failover. This particular version of the pg_autoctl client tool  has
       been compiled using libpq for PostgreSQL 12.3 and is compatible with Postgres 10, 11, 12, and 13.

   pg_autoctl create
       pg_autoctl create - Create a pg_auto_failover node, or formation

   pg_autoctl create monitor
       pg_autoctl create monitor - Initialize a pg_auto_failover monitor node

   Synopsis
       This  command  initializes  a  PostgreSQL  cluster and installs the pgautofailover extension so that it's
       possible to use the new instance to monitor PostgreSQL services:

          usage: pg_autoctl create monitor  [ --pgdata --pgport --pgctl --hostname ]

          --pgctl           path to pg_ctl
          --pgdata          path to data directory
          --pgport          PostgreSQL's port number
          --hostname        hostname by which postgres is reachable
          --auth            authentication method for connections from data nodes
          --skip-pg-hba     skip editing pg_hba.conf rules
          --run             create node then run pg_autoctl service
          --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
          --ssl-mode        use that sslmode in connection strings
          --ssl-ca-file     set the Postgres ssl_ca_file to that file path
          --ssl-crl-file    set the Postgres ssl_crl_file to that file path
          --no-ssl          don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
          --server-key      set the Postgres ssl_key_file to that file path
          --server-cert     set the Postgres ssl_cert_file to that file path

   Description
       The pg_autoctl tool is the client tool provided by pg_auto_failover to create and manage  Postgres  nodes
       and  the  pg_auto_failover monitor node. The command is built with many sub-commands that each have their
       own manual page.

   Options
       The following options are available to pg_autoctl create monitor:

       --pgctl
              Path to the pg_ctl tool to use for the version of PostgreSQL you want to use.

              Defaults to the pg_ctl found in the PATH when there is a single entry  for  pg_ctl  in  the  PATH.
              Check your setup using which -a pg_ctl.

              When  using  an  RPM  based  distribution  such  as  RHEL  or  CentOS,  the  path would usually be
              /usr/pgsql-13/bin/pg_ctl for Postgres 13.

              When using a debian based distribution such as  debian  or  ubuntu,  the  path  would  usually  be
              /usr/lib/postgresql/13/bin/pg_ctl  for  Postgres  13.   Those  distributions  also use the package
              postgresql-common which provides /usr/bin/pg_config.  This  tool  can  be  automatically  used  by
              pg_autoctl to discover the default version of Postgres to use on your setup.

       --pgdata
              Location  where  to  initialize  a  Postgres  database  cluster,  using  either  pg_ctl  initdb or
              pg_basebackup. Defaults to the environment variable PGDATA.

       --pgport
              Postgres port to use, defaults to 5432.

       --hostname
              Hostname or IP address (both v4 and v6 are supported) to use from any other  node  to  connect  to
              this node.

              When not provided, a default value is computed by running the following algorithm.

                 1. We  get this machine's "public IP" by opening a connection to the 8.8.8.8:53 public service.
                    Then we get TCP/IP client address that has been used to make that connection.

                 2. We then do a reverse DNS lookup on the IP address found in the  previous  step  to  fetch  a
                    hostname for our local machine.

                 3. If  the reverse DNS lookup is successful , then pg_autoctl does a forward DNS lookup of that
                    hostname.

              When the forward DNS lookup response in step 3. is an IP address found in one of our local network
              interfaces, then pg_autoctl uses the  hostname  found  in  step  2.  as  the  default  --hostname.
              Otherwise it uses the IP address found in step 1.

              You  may  use  the --hostname command line option to bypass the whole DNS lookup based process and
              force the local node name to a fixed value.

       --auth Authentication method used by pg_autoctl when editing the Postgres HBA file to open connections to
              other nodes. No default value, must be provided by the user. The value  --trust  is  only  a  good
              choice for testing and evaluation of pg_auto_failover, see security for more information.

       --skip-pg-hba
              When  this  option  is  used  then  pg_autoctl refrains from any editing of the Postgres HBA file.
              Please note that editing the HBA file is still needed so that other nodes can connect using either
              read privileges or replication streaming privileges.

              When --skip-pg-hba is used, pg_autoctl still outputs the HBA entries it needs in the logs, it only
              skips editing the HBA file.

       --run  Immediately run the pg_autoctl service after having created this node.

       --ssl-self-signed
              Generate SSL self-signed certificates to provide network encryption. This does not protect against
              man-in-the-middle kinds of attacks. See security for more about our SSL settings.

       --ssl-mode
              SSL Mode used by pg_autoctl  when  connecting  to  other  nodes,  including  when  connecting  for
              streaming replication.

       --ssl-ca-file
              Set the Postgres ssl_ca_file to that file path.

       --ssl-crl-file
              Set the Postgres ssl_crl_file to that file path.

       --no-ssl
              Don't enable network encryption. This is not recommended, prefer --ssl-self-signed.

       --server-key
              Set the Postgres ssl_key_file to that file path.

       --server-cert
              Set the Postgres ssl_cert_file to that file path.

   pg_autoctl create postgres
       pg_autoctl create postgres - Initialize a pg_auto_failover postgres node

   Synopsis
       The  command  pg_autoctl  create  postgres  initializes  a standalone Postgres node to a pg_auto_failover
       monitor. The monitor is then handling auto-failover for this Postgres node (as soon as  a  secondary  has
       been registered too, and is known to be healthy).

          usage: pg_autoctl create postgres

            --pgctl           path to pg_ctl
            --pgdata          path to data directory
            --pghost          PostgreSQL's hostname
            --pgport          PostgreSQL's port number
            --listen          PostgreSQL's listen_addresses
            --username        PostgreSQL's username
            --dbname          PostgreSQL's database name
            --name            pg_auto_failover node name
            --hostname        hostname used to connect from the other nodes
            --formation       pg_auto_failover formation
            --monitor         pg_auto_failover Monitor Postgres URL
            --auth            authentication method for connections from monitor
            --skip-pg-hba     skip editing pg_hba.conf rules
            --pg-hba-lan      edit pg_hba.conf rules for --dbname in detected LAN
            --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
            --ssl-mode        use that sslmode in connection strings
            --ssl-ca-file     set the Postgres ssl_ca_file to that file path
            --ssl-crl-file    set the Postgres ssl_crl_file to that file path
            --no-ssl          don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
            --server-key      set the Postgres ssl_key_file to that file path
            --server-cert     set the Postgres ssl_cert_file to that file path
            --candidate-priority    priority of the node to be promoted to become primary
            --replication-quorum    true if node participates in write quorum
            --maximum-backup-rate   maximum transfer rate of data transferred from the server during initial sync

   Description
       Three  different  modes  of  initialization  are  supported  by  this  command,  corresponding to as many
       implementation strategies.

          1. Initialize a primary node from scratch

             This happens when --pgdata (or the environment variable PGDATA) points to an non-existing or  empty
             directory. Then the given --hostname is registered to the pg_auto_failover --monitor as a member of
             the --formation.

             The monitor answers to the registration call with a state to assign to the new member of the group,
             either  SINGLE  or WAIT_STANDBY. When the assigned state is SINGLE, then pg_autoctl create postgres
             proceedes to initialize a new PostgreSQL instance.

          2. Initialize an already existing primary server

             This happens when --pgdata (or the environment variable  PGDATA)  points  to  an  already  existing
             directory  that  belongs  to  a PostgreSQL instance. The standard PostgreSQL tool pg_controldata is
             used to recognize whether the directory belongs to a PostgreSQL instance.

             In that case, the given --hostname is registered to the monitor in the tentative SINGLE state. When
             the given --formation and --group is currently empty, then the monitor accepts the registration and
             the pg_autoctl create prepares the already existing primary server for pg_auto_failover.

          3. Initialize a secondary node from scratch

             This happens when --pgdata (or the environment variable PGDATA) points to a non-existing  or  empty
             directory, and when the monitor registration call assigns the state WAIT_STANDBY in step 1.

             In  that  case,  the  pg_autoctl  create  command steps through the initial states of registering a
             secondary server, which includes preparing the primary server PostgreSQL HBA rules and  creating  a
             replication slot.

             When  the  command  ends  successfully,  a  PostgreSQL  secondary  server  has  been  created  with
             pg_basebackup and is now started, catching-up to the primary server.

          4. Initialize a secondary node from an existing data directory

             When the data directory pointed to by the  option  --pgdata  or  the  environment  variable  PGDATA
             already  exists,  then  pg_auto_failover verifies that the system identifier matches the one of the
             other nodes already existing in the same group.

             The system identifier can be obtained with the command pg_controldata.  All  nodes  in  a  physical
             replication  setting must have the same system identifier, and so in pg_auto_failover all the nodes
             in a same group have that constraint too.

             When the system identifier matches the already registered system identifier of other nodes  in  the
             same  group, then the node is set-up as a standby and Postgres is started with the primary conninfo
             pointed at the current primary.

       The --auth option allows setting up authentication method to be used when monitor node makes a connection
       to data node with pgautofailover_monitor user. As with the pg_autoctl_create_monitor command,  you  could
       use  --auth  trust  when  playing  with pg_auto_failover at first and consider something production grade
       later. Also, consider using --skip-pg-hba if you already have your own provisioning tools with a security
       compliance process.

       See security for notes on .pgpass

   Options
       The following options are available to pg_autoctl create postgres:

       --pgctl
              Path to the pg_ctl tool to use for the version of PostgreSQL you want to use.

              Defaults to the pg_ctl found in the PATH when there is a single entry  for  pg_ctl  in  the  PATH.
              Check your setup using which -a pg_ctl.

              When  using  an  RPM  based  distribution  such  as  RHEL  or  CentOS,  the  path would usually be
              /usr/pgsql-13/bin/pg_ctl for Postgres 13.

              When using a debian based distribution such as  debian  or  ubuntu,  the  path  would  usually  be
              /usr/lib/postgresql/13/bin/pg_ctl  for  Postgres  13.   Those  distributions  also use the package
              postgresql-common which provides /usr/bin/pg_config.  This  tool  can  be  automatically  used  by
              pg_autoctl to discover the default version of Postgres to use on your setup.

       --pgdata
              Location  where  to  initialize  a  Postgres  database  cluster,  using  either  pg_ctl  initdb or
              pg_basebackup. Defaults to the environment variable PGDATA.

       --pghost
              Hostname to use when connecting to the local Postgres instance from  the  pg_autoctl  process.  By
              default,  this  field  is left blank in the connection string, allowing to use Unix Domain Sockets
              with the default path compiled in your libpq version, usually provided by  the  Operating  System.
              That would be /var/run/postgresql when using debian or ubuntu.

       --pgport
              Postgres port to use, defaults to 5432.

       --listen
              PostgreSQL's  listen_addresses  to  setup.  At  the  moment  only one address is supported in this
              command line option.

       --username
              PostgreSQL's username to use when connecting to the local Postgres instance to manage it.

       --dbname
              PostgreSQL's database name to use  in  your  application.  Defaults  to  being  the  same  as  the
              --username, or to postgres when none of those options are used.

       --name Node  name used on the monitor to refer to this node. The hostname is a technical information, and
              given Postgres requirements on the  HBA  setup  and  DNS  resolution  (both  forward  and  reverse
              lookups), IP addresses are often used for the hostname.

              The --name option allows using a user-friendly name for your Postgres nodes.

       --hostname
              Hostname  or  IP  address  (both v4 and v6 are supported) to use from any other node to connect to
              this node.

              When not provided, a default value is computed by running the following algorithm.

                 1. We get this machine's "public IP" by opening a connection to the given monitor  hostname  or
                    IP address. Then we get TCP/IP client address that has been used to make that connection.

                 2. We  then  do  a  reverse  DNS lookup on the IP address found in the previous step to fetch a
                    hostname for our local machine.

                 3. If the reverse DNS lookup is successful , then pg_autoctl does a forward DNS lookup of  that
                    hostname.

              When the forward DNS lookup response in step 3. is an IP address found in one of our local network
              interfaces,  then  pg_autoctl  uses  the  hostname  found  in  step  2. as the default --hostname.
              Otherwise it uses the IP address found in step 1.

              You may use the --hostname command line option to bypass the whole DNS lookup  based  process  and
              force the local node name to a fixed value.

       --formation
              Formation  to  register  the  node into on the monitor. Defaults to the default formation, that is
              automatically created in the monitor in the pg_autoctl_create_monitor command.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node  username  and  target  the
              pg_auto_failover  database  name.  It  is  possible to show the Postgres URI from the monitor node
              using the command pg_autoctl_show_uri.

       --auth Authentication method used by pg_autoctl when editing the Postgres HBA file to open connections to
              other nodes. No default value, must be provided by the user. The value  --trust  is  only  a  good
              choice for testing and evaluation of pg_auto_failover, see security for more information.

       --skip-pg-hba
              When  this  option  is  used  then  pg_autoctl refrains from any editing of the Postgres HBA file.
              Please note that editing the HBA file is still needed so that other nodes can connect using either
              read privileges or replication streaming privileges.

              When --skip-pg-hba is used, pg_autoctl still outputs the HBA entries it needs in the logs, it only
              skips editing the HBA file.

       --pg-hba-lan
              When this option is used pg_autoctl determines the  local  IP  address  used  to  connect  to  the
              monitor,  and  retrieves  its netmask, and uses that to compute your local area network CIDR. This
              CIDR is then opened for connections in the Postgres HBA rules.

              For instance, when the monitor resolves to 192.168.0.1  and  your  local  Postgres  node  uses  an
              inferface  with  IP address 192.168.0.2/255.255.255.0 to connect to the monitor, then the LAN CIDR
              is computed to be 192.168.0.0/24.

       --candidate-priority
              Sets this node replication setting for candidate priority to the given value (between 0  and  100)
              at node registration on the monitor. Defaults to 50.

       --replication-quorum
              Sets  this  node  replication  setting  for  replication quorum to the given value (either true or
              false) at node  registration  on  the  monitor.   Defaults  to  true,  which  enables  synchronous
              replication.

       --maximum-backup-rate
              Sets  the  maximum  transfer rate of data transferred from the server during initial sync. This is
              used by pg_basebackup.  Defaults to 100M.

       --run  Immediately run the pg_autoctl service after having created this node.

       --ssl-self-signed
              Generate SSL self-signed certificates to provide network encryption. This does not protect against
              man-in-the-middle kinds of attacks. See security for more about our SSL settings.

       --ssl-mode
              SSL Mode used by pg_autoctl  when  connecting  to  other  nodes,  including  when  connecting  for
              streaming replication.

       --ssl-ca-file
              Set the Postgres ssl_ca_file to that file path.

       --ssl-crl-file
              Set the Postgres ssl_crl_file to that file path.

       --no-ssl
              Don't enable network encryption. This is not recommended, prefer --ssl-self-signed.

       --server-key
              Set the Postgres ssl_key_file to that file path.

       --server-cert
              Set the Postgres ssl_cert_file to that file path.

   pg_autoctl create formation
       pg_autoctl create formation - Create a new formation on the pg_auto_failover monitor

   Synopsis
       This command registers a new formation on the monitor, with the specified kind:

          usage: pg_autoctl create formation  [ --pgdata --monitor --formation --kind --dbname  --with-secondary --without-secondary ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --formation   name of the formation to create
          --kind        formation kind, either "pgsql" or "citus"
          --dbname      name for postgres database to use in this formation
          --enable-secondary     create a formation that has multiple nodes that can be
                                 used for fail over when others have issues
          --disable-secondary    create a citus formation without nodes to fail over to
          --number-sync-standbys minimum number of standbys to confirm write

   Description
       A  single  pg_auto_failover  monitor  may  manage any number of formations, each composed of at least one
       Postgres service group. This commands creates a new formation so that it is  then  possible  to  register
       Postgres nodes in the new formation.

   Options
       The following options are available to pg_autoctl create formation:

       --pgdata
              Location  where  to  initialize  a  Postgres  database  cluster,  using  either  pg_ctl  initdb or
              pg_basebackup. Defaults to the environment variable PGDATA.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node  username  and  target  the
              pg_auto_failover  database  name.  It  is  possible to show the Postgres URI from the monitor node
              using the command pg_autoctl_show_uri.

       --formation
              Name of the formation to create.

       --kind A pg_auto_failover formation could be of kind  pgsql  or  of  kind  citus.  At  the  moment  citus
              formation kinds are not managed in the Open Source version of pg_auto_failover.

       --dbname
              Name  of  the  database  to use in the formation, mostly useful to formation kinds citus where the
              Citus extension is only installed in a single target database.

       --enable-secondary
              The formation to be created allows using standby nodes. Defaults to true. Mostly useful for  Citus
              formations.

       --disable-secondary
              See --enable-secondary above.

       --number-sync-standby
              Postgres  streaming  replication  uses  synchronous_standby_names  to setup how many standby nodes
              should have received a copy of the transaction data. When using  pg_auto_failover  this  setup  is
              handled at the formation level.

              Defaults to zero when creating the first two Postgres nodes in a formation in the same group. When
              set  to  zero pg_auto_failover uses synchronous replication only when a standby node is available:
              the idea is to allow failover, this setting does not allow proper HA for Postgres.

              When adding a third node that participates in the  quorum  (one  primary,  two  secondaries),  the
              setting is automatically changed from zero to one.

   pg_autoctl drop
       pg_autoctl drop - Drop a pg_auto_failover node, or formation

   pg_autoctl drop monitor
       pg_autoctl drop monitor - Drop the pg_auto_failover monitor

   Synopsis
       This command allows to review all the replication settings of a given formation (defaults to 'default' as
       usual):

          usage: pg_autoctl drop monitor [ --pgdata --destroy ]

          --pgdata      path to data directory
          --destroy     also destroy Postgres database

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --destroy
              By  default  the  pg_autoctl  drop  monitor commands does not remove the Postgres database for the
              monitor. When using --destroy, the Postgres installation is also deleted.

   pg_autoctl drop node
       pg_autoctl drop node - Drop a node from the pg_auto_failover monitor

   Synopsis
       This command drops a Postgres node from the pg_auto_failover monitor:

          usage: pg_autoctl drop node [ [ [ --pgdata ] [ --destroy ] ] | [ --monitor [ [ --hostname --pgport ] | [ --formation --name ] ] ] ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --formation   pg_auto_failover formation
          --name        drop the node with the given node name
          --hostname    drop the node with given hostname and pgport
          --pgport      drop the node with given hostname and pgport
          --destroy     also destroy Postgres database
          --force       force dropping the node from the monitor
          --wait        how many seconds to wait, default to 60

   Description
       Two modes of operations are implemented in the pg_autoctl drop node command.

       When removing a node that still exists, it is possible to use pg_autoctl drop node  --destroy  to  remove
       the node both from the monitor and also delete the local Postgres instance entirely.

       When removing a node that doesn't exist physically anymore, or when the VM that used to host the node has
       been  lost  entirely,  use  either  the  pair  of  options --hostname and --pgport or the pair of options
       --formation and --name to match the node registration record on the monitor database, and get it  removed
       from the known list of nodes on the monitor.

       Then  option  --force  can be used when the target node to remove does not exist anymore. When a node has
       been lost entirely, it's not going to be able to finish the procedure itself, and it is then possible  to
       instruct the monitor of the situation.

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --monitor
              Postgres  URI  used  to  connect to the monitor. Must use the autoctl_node username and target the
              pg_auto_failover database name. It is possible to show the Postgres  URI  from  the  monitor  node
              using the command pg_autoctl_show_uri.

       --hostname
              Hostname  of  the  Postgres  node  to  remove  from  the  monitor. Use either --name or --hostname
              --pgport, but not both.

       --pgport
              Port of the Postgres node to remove from the monitor. Use either --name  or  --hostname  --pgport,
              but not both.

       --name Name  of  the  node  to remove from the monitor. Use either --name or --hostname --pgport, but not
              both.

       --destroy
              By default the pg_autoctl drop monitor commands does not remove  the  Postgres  database  for  the
              monitor. When using --destroy, the Postgres installation is also deleted.

       --force
              By  default  a  node  is  expected to reach the assigned state DROPPED when it is removed from the
              monitor, and has the opportunity to implement clean-up actions. When the target node to remove  is
              not  available  anymore,  it  is possible to use the option --force to immediately remove the node
              from the monitor.

       --wait How many seconds to wait for the node to be dropped entirely. The command stops  when  the  target
              node  is  not to be found on the monitor anymore, or when the timeout has elapsed, whichever comes
              first. The value 0 (zero) disables the timeout and disables waiting entirely, making  the  command
              async.

   Examples
          $ pg_autoctl drop node --destroy --pgdata ./node3
          17:52:21 54201 INFO  Reaching assigned state "secondary"
          17:52:21 54201 INFO  Removing node with name "node3" in formation "default" from the monitor
          17:52:21 54201 WARN  Postgres is not running and we are in state secondary
          17:52:21 54201 WARN  Failed to update the keeper's state from the local PostgreSQL instance, see above for details.
          17:52:21 54201 INFO  Calling node_active for node default/4/0 with current state: PostgreSQL is running is false, sync_state is "", latest WAL LSN is 0/0.
          17:52:21 54201 INFO  FSM transition to "dropped": This node is being dropped from the monitor
          17:52:21 54201 INFO  Transition complete: current state is now "dropped"
          17:52:21 54201 INFO  This node with id 4 in formation "default" and group 0 has been dropped from the monitor
          17:52:21 54201 INFO  Stopping PostgreSQL at "/Users/dim/dev/MS/pg_auto_failover/tmux/node3"
          17:52:21 54201 INFO  /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl --pgdata /Users/dim/dev/MS/pg_auto_failover/tmux/node3 --wait stop --mode fast
          17:52:21 54201 INFO  /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl status -D /Users/dim/dev/MS/pg_auto_failover/tmux/node3 [3]
          17:52:21 54201 INFO  pg_ctl: no server running
          17:52:21 54201 INFO  pg_ctl stop failed, but PostgreSQL is not running anyway
          17:52:21 54201 INFO  Removing "/Users/dim/dev/MS/pg_auto_failover/tmux/node3"
          17:52:21 54201 INFO  Removing "/Users/dim/dev/MS/pg_auto_failover/tmux/config/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node3/pg_autoctl.cfg"

   pg_autoctl drop formation
       pg_autoctl drop formation - Drop a formation on the pg_auto_failover monitor

   Synopsis
       This command drops an existing formation on the monitor:

          usage: pg_autoctl drop formation  [ --pgdata --formation ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --formation   name of the formation to drop

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --monitor
              Postgres  URI  used  to  connect to the monitor. Must use the autoctl_node username and target the
              pg_auto_failover database name. It is possible to show the Postgres  URI  from  the  monitor  node
              using the command pg_autoctl_show_uri.

       --formation
              Name of the formation to drop from the monitor.

   pg_autoctl config
       pg_autoctl config - Manages the pg_autoctl configuration

   pg_autoctl config get
       pg_autoctl config get - Get the value of a given pg_autoctl configuration variable

   Synopsis
       This command prints a pg_autoctl configuration setting:

          usage: pg_autoctl config get  [ --pgdata ] [ --json ] [ section.option ]

          --pgdata      path to data directory

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --json Output JSON formated data.

   Description
       When  the argument section.option is used, this is the name of a configuration ooption. The configuration
       file for pg_autoctl is stored using the INI format.

       When no argument is given to pg_autoctl config get the entire configuration file is given in the  output.
       To  figure  out  where the configuration file is stored, see pg_autoctl_show_file and use pg_autoctl show
       file --config.

   Examples
       Without arguments, we get the entire file:

          $ pg_autoctl config get --pgdata node1
          [pg_autoctl]
          role = keeper
          monitor = postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer
          formation = default
          group = 0
          name = node1
          hostname = localhost
          nodekind = standalone

          [postgresql]
          pgdata = /Users/dim/dev/MS/pg_auto_failover/tmux/node1
          pg_ctl = /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
          dbname = demo
          host = /tmp
          port = 5501
          proxyport = 0
          listen_addresses = *
          auth_method = trust
          hba_level = app

          [ssl]
          active = 1
          sslmode = require
          cert_file = /Users/dim/dev/MS/pg_auto_failover/tmux/node1/server.crt
          key_file = /Users/dim/dev/MS/pg_auto_failover/tmux/node1/server.key

          [replication]
          maximum_backup_rate = 100M
          backup_directory = /Users/dim/dev/MS/pg_auto_failover/tmux/backup/node_1

          [timeout]
          network_partition_timeout = 20
          prepare_promotion_catchup = 30
          prepare_promotion_walreceiver = 5
          postgresql_restart_failure_timeout = 20
          postgresql_restart_failure_max_retries = 3

       It is possible to pipe JSON formated output to the jq command line  and  filter  the  result  down  to  a
       specific section of the file:

          $ pg_autoctl config get --pgdata node1 --json | jq .pg_autoctl
          {
            "role": "keeper",
            "monitor": "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer",
            "formation": "default",
            "group": 0,
            "name": "node1",
            "hostname": "localhost",
            "nodekind": "standalone"
          }

       Finally, a single configuration element can be listed:

          $ pg_autoctl config get --pgdata node1 ssl.sslmode --json
          require

   pg_autoctl config set
       pg_autoctl config set - Set the value of a given pg_autoctl configuration variable

   Synopsis
       This command prints a pg_autoctl configuration setting:

          usage: pg_autoctl config set  [ --pgdata ] [ --json ] section.option [ value ]

          --pgdata      path to data directory

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --json Output JSON formated data.

   Description
       This commands allows to set a pg_autoctl configuration setting to a new value. Most settings are possible
       to change and can be reloaded online.

       Some  of  those  commands  can  then  be  applied  with a pg_autoctl reload command to an already running
       process.

   Settings
       pg_autoctl.role
          This setting can not be changed. It can be either monitor or keeper and the rest of the  configuration
          file is read depending on this value.

       pg_autoctl.monitor
          URI of the pg_autoctl monitor Postgres service. Can be changed with a reload.

          To  register  an  existing  node  to a new monitor, use pg_autoctl disable monitor and then pg_autoctl
          enable monitor.

       pg_autoctl.formation
          Formation to which this node has been registered. Changing this setting is not supported.

       pg_autoctl.group
          Group in which this node has been registered. Changing this setting is not supported.

       pg_autoctl.name
          Name of the node as known to the monitor and listed in pg_autoctl show state. Can be  changed  with  a
          reload.

       pg_autoctl.hostname
          Hostname or IP address of the node, as known to the monitor. Can be changed with a reload.

       pg_autoctl.nodekind
          This  setting  can  not  be  changed  and  depends  on  the  command that has been used to create this
          pg_autoctl node.

       postgresql.pgdata
          Directory where the managed Postgres instance is to be  created  (or  found)  and  managed.  Can't  be
          changed.

       postgresql.pg_ctl
          Path  to  the  pg_ctl  tool used to manage this Postgres instance.  Absolute path depends on the major
          version of Postgres and looks like /usr/lib/postgresql/13/bin/pg_ctl when using a debian or ubuntu OS.

          Can be changed after a major upgrade of Postgres.

       postgresql.dbname
          Name of the database that is used to connect to Postgres. Can be changed, but  then  must  be  changed
          manually on the monitor's pgautofailover.formation table with a SQL command.

          WARNING:
              When  using  pg_auto_failover  enterprise  edition  with Citus support, this is the database where
              pg_autoctl maintains the list of Citus nodes on the coordinator. Using the same database  name  as
              your application that uses Citus is then crucial.

       postgresql.host
          Hostname  to  use in connection strings when connecting from the local pg_autoctl process to the local
          Postgres database. Defaults to using the Operating System default value for  the  Unix  Domain  Socket
          directory, either /tmp or when using debian or ubuntu /var/run/postgresql.

          Can be changed with a reload.

       postgresql.port
          Port  on  which  Postgres  should  be managed. Can be changed offline, between a pg_autoctl stop and a
          subsequent pg_autoctl start.

       postgresql.listen_addresses
          Value to set to Postgres parameter of the same name. At the moment pg_autoctl only supports  a  single
          address for this parameter.

       postgresql.auth_method
          Authentication  method  to  use  when  editing HBA rules to allow the Postgres nodes of a formation to
          connect to each other, and to the monitor, and to allow the monitor to connect to the nodes.

          Can be changed online with a reload, but actually adding new HBA  rules  requires  a  restart  of  the
          "node-active" service.

       postgresql.hba_level
          This  setting  reflects  the  choice of --skip-pg-hba or --pg-hba-lan that has been used when creating
          this pg_autoctl node. Can be changed with a reload, though the HBA rules  that  have  been  previously
          added will not get removed.

       ssl.active, ssl.sslmode, ssl.cert_file, ssl.key_file, etc
          Please  use  the command pg_autoctl enable ssl or pg_autoctl disable ssl to manage the SSL settings in
          the ssl section of the configuration. Using those commands, the settings can be changed online.

       replication.maximum_backup_rate
          Used as a parameter to pg_basebackup, defaults to 100M. Can be changed with a  reload.  Changing  this
          value does not affect an already running pg_basebackup command.

          Limiting  the  bandwidth used by pg_basebackup makes the operation slower, and still has the advantage
          of limiting the impact on the disks of the primary server.

       replication.backup_directory
          Target location of the pg_basebackup command used by pg_autoctl when creating a secondary  node.  When
          done with fetching the data over the network, then pg_autoctl uses the rename(2) system-call to rename
          the temporary download location to the target PGDATA location.

          The  rename(2)  system-call is known to be atomic when both the source and the target of the operation
          are using the same file system / mount point.

          Can be changed online with a reload, will not affect already running pg_basebackup sub-processes.

       replication.password
          Used as a parameter in the  connection  string  to  the  upstream  Postgres  node.  The  "replication"
          connection uses the password set-up in the pg_autoctl configuration file.

          Changing the replication.password of a pg_autoctl configuration has no effect on the Postgres database
          itself.  The  password  must  match what the Postgres upstream node expects, which can be set with the
          following SQL command run on the upstream server (primary or other standby node):

              alter user pgautofailover_replicator password 'h4ckm3m0r3';

          The replication.password can be changed online with a reload, but  requires  restarting  the  Postgres
          service to be activated. Postgres only reads the primary_conninfo connection string at start-up, up to
          and  including  Postgres  12.  With  Postgres 13 and following, it is possible to reload this Postgres
          paramater.

       timeout.network_partition_timeout
          Timeout (in seconds) that pg_autoctl waits before deciding that it is on the losing side of a  network
          partition.  When  pg_autoctl  fails  to  connect  to  the monitor and when the local Postgres instance
          pg_stat_replication system view is empty, and after this many seconds  have  passed,  then  pg_autoctl
          demotes itself.

          Can be changed with a reload.

       timeout.prepare_promotion_catchup
          Currently not used in the source code. Can be changed with a reload.

       timeout.prepare_promotion_walreceiver
          Currently not used in the source code. Can be changed with a reload.

       timeout.postgresql_restart_failure_timeout
          When  pg_autoctl  fails  to  start Postgres for at least this duration from the first attempt, then it
          starts reporting that Postgres is not running to the monitor, which might then decide to  implement  a
          failover.

          Can be changed with a reload.

       timeout.postgresql_restart_failure_max_retries
          When  pg_autoctl  fails  to  start Postgres for at least this many times then it starts reporting that
          Postgres is not running to the monitor, which them might decide to implement a failover.

          Can be changed with a reload.

   pg_autoctl config check
       pg_autoctl config check - Check pg_autoctl configuration

   Synopsis
       This command implements a very basic list of sanity checks for a pg_autoctl node setup:

          usage: pg_autoctl config check  [ --pgdata ] [ --json ]

          --pgdata      path to data directory
          --json        output data in the JSON format

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --json Output JSON formated data.

   Examples
          $ pg_autoctl config check --pgdata node1
          18:37:27 63749 INFO  Postgres setup for PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" is ok, running with PID 5501 and port 99698
          18:37:27 63749 INFO  Connection to local Postgres ok, using "port=5501 dbname=demo host=/tmp"
          18:37:27 63749 INFO  Postgres configuration settings required for pg_auto_failover are ok
          18:37:27 63749 WARN  Postgres 12.1 does not support replication slots on a standby node
          18:37:27 63749 INFO  Connection to monitor ok, using "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer"
          18:37:27 63749 INFO  Monitor is running version "1.5.0.1", as expected
          pgdata:                /Users/dim/dev/MS/pg_auto_failover/tmux/node1
          pg_ctl:                /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
          pg_version:            12.3
          pghost:                /tmp
          pgport:                5501
          proxyport:             0
          pid:                   99698
          is in recovery:        no
          Control Version:       1201
          Catalog Version:       201909212
          System Identifier:     6941034382470571312
          Latest checkpoint LSN: 0/6000098
          Postmaster status:     ready

   pg_autoctl show
       pg_autoctl show - Show pg_auto_failover information

   pg_autoctl show uri
       pg_autoctl show uri - Show the postgres uri to use to connect to pg_auto_failover nodes

   Synopsis
       This command outputs the monitor or the coordinator Postgres URI to use from an application to connect to
       Postgres:

          usage: pg_autoctl show uri  [ --pgdata --monitor --formation --json ]

            --pgdata      path to data directory
            --monitor     monitor uri
            --formation   show the coordinator uri of given formation
            --json        output data in the JSON format

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node  username  and  target  the
              pg_auto_failover  database  name.  It  is  possible to show the Postgres URI from the monitor node
              using the command pg_autoctl show uri.

              Defaults to the value of the environment variable PG_AUTOCTL_MONITOR.

       --formation
              When --formation is used, lists the Postgres URIs of all known formations on the monitor.

       --json Output a JSON formated data instead of a table formatted list.

   Examples
          $ pg_autoctl show uri
                  Type |    Name | Connection String
          -------------+---------+-------------------------------
               monitor | monitor | postgres://autoctl_node@localhost:5500/pg_auto_failover
             formation | default | postgres://localhost:5502,localhost:5503,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer

          $ pg_autoctl show uri --formation monitor
          postgres://autoctl_node@localhost:5500/pg_auto_failover

          $ pg_autoctl show uri --formation default
          postgres://localhost:5503,localhost:5502,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer

          $ pg_autoctl show uri --json
          [
           {
               "uri": "postgres://autoctl_node@localhost:5500/pg_auto_failover",
               "name": "monitor",
               "type": "monitor"
           },
           {
               "uri": "postgres://localhost:5503,localhost:5502,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer",
               "name": "default",
               "type": "formation"
           }
          ]

   Multi-hosts Postgres connection strings
       PostgreSQL since version 10 includes support for multiple hosts in its connection driver libpq, with  the
       special target_session_attrs connection property.

       This  multi-hosts connection string facility allows applications to keep using the same stable connection
       string over server-side failovers. That's why pg_autoctl show uri uses that format.

   pg_autoctl show events
       pg_autoctl show events - Prints monitor's state of nodes in a given formation and group

   Synopsis
       This command outputs the events that the pg_auto_failover events  records  about  state  changes  of  the
       pg_auto_failover nodes managed by the monitor:

          usage: pg_autoctl show events  [ --pgdata --formation --group --count ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --formation   formation to query, defaults to 'default'
          --group       group to query formation, defaults to all
          --count       how many events to fetch, defaults to 10
          --watch       display an auto-updating dashboard
          --json        output data in the JSON format

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --monitor
              Postgres  URI  used  to  connect to the monitor. Must use the autoctl_node username and target the
              pg_auto_failover database name. It is possible to show the Postgres  URI  from  the  monitor  node
              using the command pg_autoctl_show_uri.

       --formation
              List the events recorded for nodes in the given formation. Defaults to default.

       --count
              By default only the last 10 events are printed.

       --watch
              Take  control of the terminal and display the current state of the system and the last events from
              the monitor. The display is updated automatically every  500  milliseconds  (half  a  second)  and
              reacts properly to window size change.

              Depending  on the terminal window size, a different set of columns is visible in the state part of
              the output. See pg_autoctl_watch.

       --json Output a JSON formated data instead of a table formatted list.

   Examples
          $ pg_autoctl show events --count 2 --json
          [
           {
               "nodeid": 1,
               "eventid": 15,
               "groupid": 0,
               "nodehost": "localhost",
               "nodename": "node1",
               "nodeport": 5501,
               "eventtime": "2021-03-18T12:32:36.103467+01:00",
               "goalstate": "primary",
               "description": "Setting goal state of node 1 \"node1\" (localhost:5501) to primary now that at least one secondary candidate node is healthy.",
               "formationid": "default",
               "reportedlsn": "0/4000060",
               "reportedstate": "wait_primary",
               "reportedrepstate": "async",
               "candidatepriority": 50,
               "replicationquorum": true
           },
           {
               "nodeid": 1,
               "eventid": 16,
               "groupid": 0,
               "nodehost": "localhost",
               "nodename": "node1",
               "nodeport": 5501,
               "eventtime": "2021-03-18T12:32:36.215494+01:00",
               "goalstate": "primary",
               "description": "New state is reported by node 1 \"node1\" (localhost:5501): \"primary\"",
               "formationid": "default",
               "reportedlsn": "0/4000110",
               "reportedstate": "primary",
               "reportedrepstate": "quorum",
               "candidatepriority": 50,
               "replicationquorum": true
           }
          ]

   pg_autoctl show state
       pg_autoctl show state - Prints monitor's state of nodes in a given formation and group

   Synopsis
       This command outputs the current state of the formation and groups  registered  to  the  pg_auto_failover
       monitor:

          usage: pg_autoctl show state  [ --pgdata --formation --group ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --formation   formation to query, defaults to 'default'
          --group       group to query formation, defaults to all
          --local       show local data, do not connect to the monitor
          --watch       display an auto-updating dashboard
          --json        output data in the JSON format

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --monitor
              Postgres  URI  used  to  connect to the monitor. Must use the autoctl_node username and target the
              pg_auto_failover database name. It is possible to show the Postgres  URI  from  the  monitor  node
              using the command pg_autoctl_show_uri.

       --formation
              List the events recorded for nodes in the given formation. Defaults to default.

       --group
              Limit output to a single group in the formation. Default to including all groups registered in the
              target formation.

       --local
              Print the local state information without connecting to the monitor.

       --watch
              Take  control of the terminal and display the current state of the system and the last events from
              the monitor. The display is updated automatically every  500  milliseconds  (half  a  second)  and
              reacts properly to window size change.

              Depending  on the terminal window size, a different set of columns is visible in the state part of
              the output. See pg_autoctl_watch.

       --json Output a JSON formated data instead of a table formatted list.

   Description
       The pg_autoctl show state output includes the following columns:

          • Name
                Name of the node.

          • Node
                Node information. When the formation has a single group (group  zero),  then  this  column  only
                contains the nodeId.

                Only  Citus  formations  allow  several  groups.  When  using  a Citus formation the Node column
                contains the groupId and the nodeId, separated by a colon, such as 0:1 for the first coordinator
                node.

          • Host:Port
                Hostname and port number used to connect to the node.

          • TLI: LSN
                Timeline identifier (TLI) and Postgres Log Sequence Number (LSN).

                The LSN is the current position in the Postgres WAL stream. This is a  hexadecimal  number.  See
                pg_lsn for more information.

                The  current  timeline  is incremented each time a failover happens, or when doing Point In Time
                Recovery. A node can only reach the secondary state when it is  on  the  same  timeline  as  its
                primary node.

          • Connection
                This output field contains two bits of information. First, the Postgres connection type that the
                node  provides,  either  read-write  or read-only. Then the mark ! is added when the monitor has
                failed to connect to this node, and ? when the monitor didn't connect to the node yet.

          • Reported State
                The latest reported FSM state, as reported to the monitor by the pg_autoctl process  running  on
                the Postgres node.

          • Assigned State
                The  assigned  FSM state on the monitor. When the assigned state is not the same as the reported
                start, then the pg_autoctl process running on the Postgres node might  have  not  retrieved  the
                assigned  state yet, or might still be implementing the FSM transition from the current state to
                the assigned state.

   Examples
          $ pg_autoctl show state
           Name |  Node |      Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
          ------+-------+----------------+----------------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 |   1: 0/4000678 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 |   1: 0/4000678 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 |   1: 0/4000678 |    read-only |           secondary |           secondary

          $ pg_autoctl show state --local
           Name |  Node |      Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
          ------+-------+----------------+----------------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 |   1: 0/4000678 | read-write ? |             primary |             primary

          $ pg_autoctl show state --json
          [
              {
                  "health": 1,
                  "node_id": 1,
                  "group_id": 0,
                  "nodehost": "localhost",
                  "nodename": "node1",
                  "nodeport": 5501,
                  "reported_lsn": "0/4000678",
                  "reported_tli": 1,
                  "formation_kind": "pgsql",
                  "candidate_priority": 50,
                  "replication_quorum": true,
                  "current_group_state": "primary",
                  "assigned_group_state": "primary"
              },
              {
                  "health": 1,
                  "node_id": 2,
                  "group_id": 0,
                  "nodehost": "localhost",
                  "nodename": "node2",
                  "nodeport": 5502,
                  "reported_lsn": "0/4000678",
                  "reported_tli": 1,
                  "formation_kind": "pgsql",
                  "candidate_priority": 50,
                  "replication_quorum": true,
                  "current_group_state": "secondary",
                  "assigned_group_state": "secondary"
              },
              {
                  "health": 1,
                  "node_id": 3,
                  "group_id": 0,
                  "nodehost": "localhost",
                  "nodename": "node3",
                  "nodeport": 5503,
                  "reported_lsn": "0/4000678",
                  "reported_tli": 1,
                  "formation_kind": "pgsql",
                  "candidate_priority": 50,
                  "replication_quorum": true,
                  "current_group_state": "secondary",
                  "assigned_group_state": "secondary"
              }
          ]

   pg_autoctl show settings
       pg_autoctl show settings - Print replication settings for a formation from the monitor

   Synopsis
       This command allows to review all the replication settings of a given formation (defaults to 'default' as
       usual):

          usage: pg_autoctl show settings  [ --pgdata ] [ --json ] [ --formation ]

          --pgdata      path to data directory
          --monitor     pg_auto_failover Monitor Postgres URL
          --json        output data in the JSON format
          --formation   pg_auto_failover formation

   Description
       See also pg_autoctl_get_formation_settings which is a synonym.

       The output contains setting and values that apply at different contexts, as shown here with  a  formation
       of  four  nodes, where node_4 is not participating in the replication quorum and also not a candidate for
       failover:

          $ pg_autoctl show settings
             Context |    Name |                   Setting | Value
           ----------+---------+---------------------------+-------------------------------------------------------------
           formation | default |      number_sync_standbys | 1
             primary |  node_1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_3, pgautofailover_standby_2)'
                node |  node_1 |        replication quorum | true
                node |  node_2 |        replication quorum | true
                node |  node_3 |        replication quorum | true
                node |  node_4 |        replication quorum | false
                node |  node_1 |        candidate priority | 50
                node |  node_2 |        candidate priority | 50
                node |  node_3 |        candidate priority | 50
                node |  node_4 |        candidate priority | 0

       Three replication settings context are listed:

          1. The "formation" context contains a single entry, the value of number_sync_standbys for  the  target
             formation.

          2. The  "primary"  context  contains one entry per group of Postgres nodes in the formation, and shows
             the current value of the synchronous_standby_names Postgres setting as computed by the monitor.  It
             should  match  what's currently set on the primary node unless while applying a change, as shown by
             the primary being in the APPLY_SETTING state.

          3. The "node" context contains two entry per nodes, one line shows the replication quorum  setting  of
             nodes, and another line shows the candidate priority of nodes.

       This command gives an overview of all the settings that apply to the current formation.

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --monitor
              Postgres  URI  used  to  connect to the monitor. Must use the autoctl_node username and target the
              pg_auto_failover database name. It is possible to show the Postgres  URI  from  the  monitor  node
              using the command pg_autoctl_show_uri.

              Defaults to the value of the environment variable PG_AUTOCTL_MONITOR.

       --formation
              Show the current replication settings for the given formation. Defaults to the default formation.

       --json Output a JSON formated data instead of a table formatted list.

   Examples
          $ pg_autoctl show settings
               Context |    Name |                   Setting | Value
             ----------+---------+---------------------------+-------------------------------------------------------------
             formation | default |      number_sync_standbys | 1
               primary |   node1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
                  node |   node1 |        candidate priority | 50
                  node |   node2 |        candidate priority | 50
                  node |   node3 |        candidate priority | 50
                  node |   node1 |        replication quorum | true
                  node |   node2 |        replication quorum | true
                  node |   node3 |        replication quorum | true

   pg_autoctl show standby-names
       pg_autoctl show standby-names - Prints synchronous_standby_names for a given group

   Synopsis
       This  command  prints  the current value for synchronous_standby_names for the primary Postgres server of
       the target group (default 0) in the target formation (default default), as computed by the monitor:

          usage: pg_autoctl show standby-names  [ --pgdata ] --formation --group

            --pgdata      path to data directory
            --monitor     pg_auto_failover Monitor Postgres URL
            --formation   formation to query, defaults to 'default'
            --group       group to query formation, defaults to all
            --json        output data in the JSON format

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --monitor
              Postgres URI used to connect to the monitor. Must use the autoctl_node  username  and  target  the
              pg_auto_failover  database  name.  It  is  possible to show the Postgres URI from the monitor node
              using the command pg_autoctl_show_uri.

              Defaults to the value of the environment variable PG_AUTOCTL_MONITOR.

       --formation
              Show the current synchronous_standby_names value for the given formation. Defaults to the  default
              formation.

       --group
              Show  the  current  synchronous_standby_names  value  for  the given group in the given formation.
              Defaults to group 0.

       --json Output a JSON formated data instead of a table formatted list.

   Examples
          $ pg_autoctl show standby-names
          'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'

          $ pg_autoctl show standby-names --json
          {
              "formation": "default",
              "group": 0,
              "synchronous_standby_names": "ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)"
          }

   pg_autoctl show file
       pg_autoctl show file - List pg_autoctl internal files (config, state, pid)

   Synopsis
       This command the files that pg_autoctl uses internally for its own configuration, state, and pid:

          usage: pg_autoctl show file  [ --pgdata --all --config | --state | --init | --pid --contents ]

          --pgdata      path to data directory
          --all         show all pg_autoctl files
          --config      show pg_autoctl configuration file
          --state       show pg_autoctl state file
          --init        show pg_autoctl initialisation state file
          --pid         show pg_autoctl PID file
          --contents    show selected file contents
          --json        output data in the JSON format

   Description
       The pg_autoctl command follows  the  XDG  Base  Directory  Specification  and  places  its  internal  and
       configuration files by default in places such as ~/.config/pg_autoctl and ~/.local/share/pg_autoctl.

       It  is  possible  to change the default XDG locations by using the environment variables XDG_CONFIG_HOME,
       XDG_DATA_HOME, and XDG_RUNTIME_DIR.

       Also, pg_config uses sub-directories that are specific to a given  PGDATA,  making  it  possible  to  run
       several Postgres nodes on the same machine, which is very practical for testing and development purposes,
       though not advised for production setups.

   Configuration File
       The  pg_autoctl  configuration file for an instance serving the data directory at /data/pgsql is found at
       ~/.config/pg_autoctl/data/pgsql/pg_autoctl.cfg, written in the INI format.

       It is possible to get the location of the configuration file by using the command  pg_autoctl  show  file
       --config  --pgdata  /data/pgsql  and  to  output  its  content  by using the command pg_autoctl show file
       --config --contents --pgdata /data/pgsql.

       See also pg_autoctl_config_get and pg_autoctl_config_set.

   State File
       The pg_autoctl state file for an  instance  serving  the  data  directory  at  /data/pgsql  is  found  at
       ~/.local/share/pg_autoctl/data/pgsql/pg_autoctl.state, written in a specific binary format.

       This  file  is  not  intended  to  be  written  by anything else than pg_autoctl itself. In case of state
       corruption, see the trouble shooting section of the documentation.

       It is possible to get the location of the state file by using the command pg_autoctl  show  file  --state
       --pgdata  /data/pgsql  and  to  output  its  content  by  using  the command pg_autoctl show file --state
       --contents --pgdata /data/pgsql.

   Init State File
       The pg_autoctl init state file for an instance serving the data directory  at  /data/pgsql  is  found  at
       ~/.local/share/pg_autoctl/data/pgsql/pg_autoctl.init, written in a specific binary format.

       This  file  is  not  intended  to  be  written  by anything else than pg_autoctl itself. In case of state
       corruption, see the trouble shooting section of the documentation.

       This initialization state file only exists during the  initialization  of  a  pg_auto_failover  node.  In
       normal operations, this file does not exist.

       It  is  possible  to  get the location of the state file by using the command pg_autoctl show file --init
       --pgdata /data/pgsql and to output  its  content  by  using  the  command  pg_autoctl  show  file  --init
       --contents --pgdata /data/pgsql.

   PID File
       The  pg_autoctl  PID  file  for  an  instance  serving  the  data  directory  at  /data/pgsql is found at
       /tmp/pg_autoctl/data/pgsql/pg_autoctl.pid, written in a specific text format.

       The PID file is located in a temporary directory by default, or in  the  XDG_RUNTIME_DIR  directory  when
       this is setup.

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --all  List all the files that belong to this pg_autoctl node.

       --config
              Show only the configuration file.

       --state
              Show only the state file.

       --init Show  only  the init state file, which only exists while the command pg_autoctl create postgres or
              the command pg_autoctl create monitor is running, or when than command failed  (and  can  then  be
              retried).

       --pid  Show only the pid file.

       --contents
              When  one  of the options to show a specific file is in use, then --contents shows the contents of
              the selected file instead of showing its absolute file path.

       --json Output JSON formated data.

   Examples
       The following examples are taken from a QA environment that has been prepared thanks to the make  cluster
       command  made  available to the pg_auto_failover contributors. As a result, the XDG environment variables
       have been tweaked to obtain a self-contained test:

          $  tmux show-env | grep XDG
          XDG_CONFIG_HOME=/Users/dim/dev/MS/pg_auto_failover/tmux/config
          XDG_DATA_HOME=/Users/dim/dev/MS/pg_auto_failover/tmux/share
          XDG_RUNTIME_DIR=/Users/dim/dev/MS/pg_auto_failover/tmux/run

       Within that self-contained test location, we can see the following examples.

          $ pg_autoctl show file --pgdata ./node1
             File | Path
          --------+----------------
           Config | /Users/dim/dev/MS/pg_auto_failover/tmux/config/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.cfg
            State | /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.state
             Init | /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.init
              Pid | /Users/dim/dev/MS/pg_auto_failover/tmux/run/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.pid
             'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'

          $ pg_autoctl show file --pgdata node1 --state
          /Users/dim/dev/MS/pg_auto_failover/tmux/share/pg_autoctl/Users/dim/dev/MS/pg_auto_failover/tmux/node1/pg_autoctl.state

          $ pg_autoctl show file --pgdata node1 --state --contents
          Current Role:             primary
          Assigned Role:            primary
          Last Monitor Contact:     Thu Mar 18 17:32:25 2021
          Last Secondary Contact:   0
          pg_autoctl state version: 1
          group:                    0
          node id:                  1
          nodes version:            0
          PostgreSQL Version:       1201
          PostgreSQL CatVersion:    201909212
          PostgreSQL System Id:     6940955496243696337

          pg_autoctl show file --pgdata node1 --config --contents --json | jq .pg_autoctl
          {
            "role": "keeper",
            "monitor": "postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer",
            "formation": "default",
            "group": 0,
            "name": "node1",
            "hostname": "localhost",
            "nodekind": "standalone"
          }

   pg_autoctl show systemd
       pg_autoctl show systemd - Print systemd service file for this node

   Synopsis
       This command outputs a configuration unit that is  suitable  for  registering  pg_autoctl  as  a  systemd
       service.

   Examples
          $ pg_autoctl show systemd --pgdata node1
          17:38:29 99778 INFO  HINT: to complete a systemd integration, run the following commands:
          17:38:29 99778 INFO  pg_autoctl -q show systemd --pgdata "node1" | sudo tee /etc/systemd/system/pgautofailover.service
          17:38:29 99778 INFO  sudo systemctl daemon-reload
          17:38:29 99778 INFO  sudo systemctl enable pgautofailover
          17:38:29 99778 INFO  sudo systemctl start pgautofailover
          [Unit]
          Description = pg_auto_failover

          [Service]
          WorkingDirectory = /Users/dim
          Environment = 'PGDATA=node1'
          User = dim
          ExecStart = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl run
          Restart = always
          StartLimitBurst = 0
          ExecReload = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl reload

          [Install]
          WantedBy = multi-user.target

       To avoid the logs output, use the -q option:

          $ pg_autoctl show systemd --pgdata node1 -q
          [Unit]
          Description = pg_auto_failover

          [Service]
          WorkingDirectory = /Users/dim
          Environment = 'PGDATA=node1'
          User = dim
          ExecStart = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl run
          Restart = always
          StartLimitBurst = 0
          ExecReload = /Applications/Postgres.app/Contents/Versions/12/bin/pg_autoctl reload

          [Install]
          WantedBy = multi-user.target

   pg_autoctl enable
       pg_autoctl enable - Enable a feature on a formation

   pg_autoctl enable secondary
       pg_autoctl enable secondary - Enable secondary nodes on a formation

   Synopsis
       This  feature  makes the most sense when using the Enterprise Edition of pg_auto_failover, which is fully
       compatible with Citus formations. When secondary are enabled, then Citus workers creation  policy  is  to
       assign  a  primary  node then a standby node for each group. When secondary is disabled the Citus workers
       creation policy is to assign only the primary nodes.

           usage: pg_autoctl enable secondary  [ --pgdata --formation ]

          --pgdata      path to data directory
          --formation   Formation to enable secondary on

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --formation
              Target formation where to enable secondary feature.

   pg_autoctl enable maintenance
       pg_autoctl enable maintenance - Enable Postgres maintenance mode on this node

   Synopsis
       A pg_auto_failover can be put to a maintenance state. The Postgres node is then still registered  to  the
       monitor,  and is known to be unreliable until maintenance is disabled. A node in the maintenance state is
       not a candidate for promotion.

       Typical use of the maintenance state include Operating System or  Postgres  reboot,  e.g.  when  applying
       security upgrades.

           usage: pg_autoctl enable maintenance  [ --pgdata --allow-failover ]

          --pgdata      path to data directory

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --formation
              Target formation where to enable secondary feature.

   Examples
          pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000760 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000760 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000760 |    read-only |           secondary |           secondary

          $ pg_autoctl enable maintenance --pgdata node3
          12:06:12 47086 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          12:06:12 47086 INFO  Following table displays times when notifications are received
              Time |  Name |  Node |      Host:Port |       Current State |      Assigned State
          ---------+-------+-------+----------------+---------------------+--------------------
          12:06:12 | node1 |     1 | localhost:5501 |             primary |        join_primary
          12:06:12 | node3 |     3 | localhost:5503 |           secondary |    wait_maintenance
          12:06:12 | node3 |     3 | localhost:5503 |    wait_maintenance |    wait_maintenance
          12:06:12 | node1 |     1 | localhost:5501 |        join_primary |        join_primary
          12:06:12 | node3 |     3 | localhost:5503 |    wait_maintenance |         maintenance
          12:06:12 | node1 |     1 | localhost:5501 |        join_primary |             primary
          12:06:13 | node3 |     3 | localhost:5503 |         maintenance |         maintenance

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000810 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000810 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000810 |         none |         maintenance |         maintenance

   pg_autoctl enable ssl
       pg_autoctl enable ssl - Enable SSL configuration on this node

   Synopsis
       It   is   possible   to   manage   Postgres   SSL   settings   with   the  pg_autoctl  command,  both  at
       pg_autoctl_create_postgres time and then again to change  your  mind  and  update  the  SSL  settings  at
       run-time.

           usage: pg_autoctl enable ssl  [ --pgdata ] [ --json ]

          --pgdata      path to data directory
          --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
          --ssl-mode        use that sslmode in connection strings
          --ssl-ca-file     set the Postgres ssl_ca_file to that file path
          --ssl-crl-file    set the Postgres ssl_crl_file to that file path
          --no-ssl          don't enable network encryption (NOT recommended, prefer --ssl-self-signed)
          --server-key      set the Postgres ssl_key_file to that file path
          --server-cert     set the Postgres ssl_cert_file to that file path

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --ssl-self-signed
              Generate SSL self-signed certificates to provide network encryption. This does not protect against
              man-in-the-middle kinds of attacks. See security for more about our SSL settings.

       --ssl-mode
              SSL  Mode  used  by  pg_autoctl  when  connecting  to  other  nodes, including when connecting for
              streaming replication.

       --ssl-ca-file
              Set the Postgres ssl_ca_file to that file path.

       --ssl-crl-file
              Set the Postgres ssl_crl_file to that file path.

       --no-ssl
              Don't enable network encryption. This is not recommended, prefer --ssl-self-signed.

       --server-key
              Set the Postgres ssl_key_file to that file path.

       --server-cert
              Set the Postgres ssl_cert_file to that file path.

   pg_autoctl enable monitor
       pg_autoctl enable monitor - Enable a monitor for this node to be orchestrated from

   Synopsis
       It is possible to disable the pg_auto_failover monitor and enable it again online in a running pg_autoctl
       Postgres node. The main use-cases where this operation is useful is when  the  monitor  node  has  to  be
       replaced,  either after a full crash of the previous monitor node, of for migrating to a new monitor node
       (hardware replacement, region or zone migration, etc).

           usage: pg_autoctl enable monitor  [ --pgdata --allow-failover ] postgres://autoctl_node@new.monitor.add.ress/pg_auto_failover

          --pgdata      path to data directory

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

   Examples
          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000760 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000760 |    read-only |           secondary |           secondary

          $ pg_autoctl enable monitor --pgdata node3 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=require'
          12:42:07 43834 INFO  Registered node 3 (localhost:5503) with name "node3" in formation "default", group 0, state "wait_standby"
          12:42:07 43834 INFO  Successfully registered to the monitor with nodeId 3
          12:42:08 43834 INFO  Still waiting for the monitor to drive us to state "catchingup"
          12:42:08 43834 WARN  Please make sure that the primary node is currently running `pg_autoctl run` and contacting the monitor.

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000810 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000810 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000810 |    read-only |           secondary |           secondary

   pg_autoctl disable
       pg_autoctl disable - Disable a feature on a formation

   pg_autoctl disable secondary
       pg_autoctl disable secondary - Disable secondary nodes on a formation

   Synopsis
       This feature makes the most sense when using the Enterprise Edition of pg_auto_failover, which  is  fully
       compatible  with  Citus formations. When secondary are disabled, then Citus workers creation policy is to
       assign a primary node then a standby node for each group. When secondary is disabled  the  Citus  workers
       creation policy is to assign only the primary nodes.

           usage: pg_autoctl disable secondary  [ --pgdata --formation ]

          --pgdata      path to data directory
          --formation   Formation to disable secondary on

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --formation
              Target formation where to disable secondary feature.

   pg_autoctl disable maintenance
       pg_autoctl disable maintenance - Disable Postgres maintenance mode on this node

   Synopsis
       A  pg_auto_failover  can be put to a maintenance state. The Postgres node is then still registered to the
       monitor, and is known to be unreliable until maintenance is disabled. A node in the maintenance state  is
       not a candidate for promotion.

       Typical  use  of  the  maintenance  state include Operating System or Postgres reboot, e.g. when applying
       security upgrades.

           usage: pg_autoctl disable maintenance  [ --pgdata --allow-failover ]

          --pgdata      path to data directory

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --formation
              Target formation where to disable secondary feature.

   Examples
          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000810 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000810 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000810 |         none |         maintenance |         maintenance

          $ pg_autoctl disable maintenance --pgdata node3
          12:06:37 47542 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          12:06:37 47542 INFO  Following table displays times when notifications are received
              Time |  Name |  Node |      Host:Port |       Current State |      Assigned State
          ---------+-------+-------+----------------+---------------------+--------------------
          12:06:37 | node3 |     3 | localhost:5503 |         maintenance |          catchingup
          12:06:37 | node3 |     3 | localhost:5503 |          catchingup |          catchingup
          12:06:37 | node3 |     3 | localhost:5503 |          catchingup |           secondary
          12:06:37 | node3 |     3 | localhost:5503 |           secondary |           secondary

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000848 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000848 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000000 |    read-only |           secondary |           secondary

   pg_autoctl disable ssl
       pg_autoctl disable ssl - Disable SSL configuration on this node

   Synopsis
       It  is  possible  to  manage   Postgres   SSL   settings   with   the   pg_autoctl   command,   both   at
       pg_autoctl_create_postgres  time  and  then  again  to  change  your  mind and update the SSL settings at
       run-time.

           usage: pg_autoctl disable ssl  [ --pgdata ] [ --json ]

          --pgdata      path to data directory
          --ssl-self-signed setup network encryption using self signed certificates (does NOT protect against MITM)
          --ssl-mode        use that sslmode in connection strings
          --ssl-ca-file     set the Postgres ssl_ca_file to that file path
          --ssl-crl-file    set the Postgres ssl_crl_file to that file path
          --no-ssl          don't disable network encryption (NOT recommended, prefer --ssl-self-signed)
          --server-key      set the Postgres ssl_key_file to that file path
          --server-cert     set the Postgres ssl_cert_file to that file path

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --ssl-self-signed
              Generate SSL self-signed certificates to provide network encryption. This does not protect against
              man-in-the-middle kinds of attacks. See security for more about our SSL settings.

       --ssl-mode
              SSL Mode used by pg_autoctl  when  connecting  to  other  nodes,  including  when  connecting  for
              streaming replication.

       --ssl-ca-file
              Set the Postgres ssl_ca_file to that file path.

       --ssl-crl-file
              Set the Postgres ssl_crl_file to that file path.

       --no-ssl
              Don't disable network encryption. This is not recommended, prefer --ssl-self-signed.

       --server-key
              Set the Postgres ssl_key_file to that file path.

       --server-cert
              Set the Postgres ssl_cert_file to that file path.

   pg_autoctl disable monitor
       pg_autoctl disable monitor - Disable the monitor for this node

   Synopsis
       It is possible to disable the pg_auto_failover monitor and enable it again online in a running pg_autoctl
       Postgres  node.  The  main  use-cases  where  this operation is useful is when the monitor node has to be
       replaced, either after a full crash of the previous monitor node, of for migrating to a new monitor  node
       (hardware replacement, region or zone migration, etc).

           usage: pg_autoctl disable monitor  [ --pgdata --force ]

          --pgdata      path to data directory
          --force       force unregistering from the monitor

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --force
              The --force covers the two following situations:

                 1. By  default,  the  command  expects  to  be able to connect to the current monitor. When the
                    current known monitor in the setup is not running anymore, use --force to skip this step.

                 2. When pg_autoctl could connect to the monitor and the node is found there, this  is  normally
                    an  error that prevents from disabling the monitor. Using --force allows the command to drop
                    the node from the monitor and continue with disabling the monitor.

   Examples
          $ pg_autoctl show state
              Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000148 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000148 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/4000148 |    read-only |           secondary |           secondary

          $ pg_autoctl disable monitor --pgdata node3
          12:41:21 43039 INFO  Found node 3 "node3" (localhost:5503) on the monitor
          12:41:21 43039 FATAL Use --force to remove the node from the monitor

          $ pg_autoctl disable monitor --pgdata node3 --force
          12:41:32 43219 INFO  Removing node 3 "node3" (localhost:5503) from monitor

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000760 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/4000760 |    read-only |           secondary |           secondary

   pg_autoctl get
       pg_autoctl get - Get a pg_auto_failover node, or formation setting

   pg_autoctl get formation settings
       pg_autoctl get formation settings - get replication settings for a formation from the monitor

   Synopsis
       This command prints a pg_autoctl replication settings:

          usage: pg_autoctl get formation settings  [ --pgdata ] [ --json ] [ --formation ]

          --pgdata      path to data directory
          --json        output data in the JSON format
          --formation   pg_auto_failover formation

   Description
       See also pg_autoctl_show_settings which is a synonym.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --json Output JSON formated data.

       --formation
              Show replication settings for given formation. Defaults to default.

   Examples
          $ pg_autoctl get formation settings
            Context |    Name |                   Setting | Value
          ----------+---------+---------------------------+-------------------------------------------------------------
          formation | default |      number_sync_standbys | 1
            primary |   node1 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'
               node |   node1 |        candidate priority | 50
               node |   node2 |        candidate priority | 50
               node |   node3 |        candidate priority | 50
               node |   node1 |        replication quorum | true
               node |   node2 |        replication quorum | true
               node |   node3 |        replication quorum | true

          $ pg_autoctl get formation settings --json
          {
              "nodes": [
                  {
                      "value": "true",
                      "context": "node",
                      "node_id": 1,
                      "setting": "replication quorum",
                      "group_id": 0,
                      "nodename": "node1"
                  },
                  {
                      "value": "true",
                      "context": "node",
                      "node_id": 2,
                      "setting": "replication quorum",
                      "group_id": 0,
                      "nodename": "node2"
                  },
                  {
                      "value": "true",
                      "context": "node",
                      "node_id": 3,
                      "setting": "replication quorum",
                      "group_id": 0,
                      "nodename": "node3"
                  },
                  {
                      "value": "50",
                      "context": "node",
                      "node_id": 1,
                      "setting": "candidate priority",
                      "group_id": 0,
                      "nodename": "node1"
                  },
                  {
                      "value": "50",
                      "context": "node",
                      "node_id": 2,
                      "setting": "candidate priority",
                      "group_id": 0,
                      "nodename": "node2"
                  },
                  {
                      "value": "50",
                      "context": "node",
                      "node_id": 3,
                      "setting": "candidate priority",
                      "group_id": 0,
                      "nodename": "node3"
                  }
              ],
              "primary": [
                  {
                      "value": "'ANY 1 (pgautofailover_standby_2, pgautofailover_standby_3)'",
                      "context": "primary",
                      "node_id": 1,
                      "setting": "synchronous_standby_names",
                      "group_id": 0,
                      "nodename": "node1"
                  }
              ],
              "formation": {
                  "value": "1",
                  "context": "formation",
                  "node_id": null,
                  "setting": "number_sync_standbys",
                  "group_id": null,
                  "nodename": "default"
              }
          }

   pg_autoctl get formation number-sync-standbys
       pg_autoctl get formation number-sync-standbys - get number_sync_standbys for a formation from the monitor

   Synopsis
       This command prints a pg_autoctl replication settings for number sync standbys:

          usage: pg_autoctl get formation number-sync-standbys  [ --pgdata ] [ --json ] [ --formation ]

          --pgdata      path to data directory
          --json        output data in the JSON format
          --formation   pg_auto_failover formation

   Description
       See also pg_autoctl_show_settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --json Output JSON formated data.

       --formation
              Show replication settings for given formation. Defaults to default.

   Examples
          $ pg_autoctl get formation number-sync-standbys
          1

          $ pg_autoctl get formation number-sync-standbys --json
          {
              "number-sync-standbys": 1
          }

   pg_autoctl get node replication-quorum
       pg_autoctl get replication-quorum - get replication-quorum property from the monitor

   Synopsis
       This command prints pg_autoctl replication quorun for a given node:

          usage: pg_autoctl get node replication-quorum  [ --pgdata ] [ --json ] [ --formation ] [ --name ]

          --pgdata      path to data directory
          --formation   pg_auto_failover formation
          --name        pg_auto_failover node name
          --json        output data in the JSON format

   Description
       See also pg_autoctl_show_settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --json Output JSON formated data.

       --formation
              Show replication settings for given formation. Defaults to default.

       --name Show replication settings for given node, selected by name.

   Examples
          $ pg_autoctl get node replication-quorum --name node1
          true

          $ pg_autoctl get node replication-quorum --name node1 --json
          {
              "name": "node1",
              "replication-quorum": true
          }

   pg_autoctl get node candidate-priority
       pg_autoctl get candidate-priority - get candidate-priority property from the monitor

   Synopsis
       This command prints pg_autoctl candidate priority for a given node:

          usage: pg_autoctl get node candidate-priority  [ --pgdata ] [ --json ] [ --formation ] [ --name ]

          --pgdata      path to data directory
          --formation   pg_auto_failover formation
          --name        pg_auto_failover node name
          --json        output data in the JSON format

   Description
       See also pg_autoctl_show_settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --json Output JSON formated data.

       --formation
              Show replication settings for given formation. Defaults to default.

       --name Show replication settings for given node, selected by name.

   Examples
          $ pg_autoctl get node candidate-priority --name node1
          50

          $ pg_autoctl get node candidate-priority --name node1 --json
          {
              "name": "node1",
              "candidate-priority": 50
          }

   pg_autoctl set
       pg_autoctl set - Set a pg_auto_failover node, or formation setting

   pg_autoctl set formation number-sync-standbys
       pg_autoctl set formation number-sync-standbys - set number_sync_standbys for a formation from the monitor

   Synopsis
       This command set a pg_autoctl replication settings for number sync standbys:

          usage: pg_autoctl set formation number-sync-standbys  [ --pgdata ] [ --json ] [ --formation ] <number_sync_standbys>

          --pgdata      path to data directory
          --formation   pg_auto_failover formation
          --json        output data in the JSON format

   Description
       The pg_auto_failover monitor ensures that at least  N+1  candidate  standby  nodes  are  registered  when
       number-sync-standbys  is  N.  This means that to be able to run the following command, at least 3 standby
       nodes with a non-zero candidate priority must be registered to the monitor:

          $ pg_autoctl set formation number-sync-standbys 2

       See also pg_autoctl_show_settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --json Output JSON formated data.

       --formation
              Show replication settings for given formation. Defaults to default.

   pg_autoctl set node replication-quorum
       pg_autoctl set replication-quorum - set replication-quorum property from the monitor

   Synopsis
       This command sets pg_autoctl replication quorum for a given node:

          usage: pg_autoctl set node replication-quorum  [ --pgdata ] [ --json ] [ --formation ] [ --name ] <true|false>

          --pgdata      path to data directory
          --formation   pg_auto_failover formation
          --name        pg_auto_failover node name
          --json        output data in the JSON format

   Description
       See also pg_autoctl_show_settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --json Output JSON formated data.

       --formation
              Show replication settings for given formation. Defaults to default.

       --name Show replication settings for given node, selected by name.

   Examples
          $ pg_autoctl set node replication-quorum --name node1 false
          12:49:37 94092 INFO  Waiting for the settings to have been applied to the monitor and primary node
          12:49:37 94092 INFO  New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
          12:49:37 94092 INFO  Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
          12:49:37 94092 INFO  New state is reported by node 1 "node1" (localhost:5501): "primary"
          false

          $ pg_autoctl set node replication-quorum --name node1 true --json
          12:49:42 94199 INFO  Waiting for the settings to have been applied to the monitor and primary node
          12:49:42 94199 INFO  New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
          12:49:42 94199 INFO  Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
          12:49:43 94199 INFO  New state is reported by node 1 "node1" (localhost:5501): "primary"
          {
              "replication-quorum": true
          }

   pg_autoctl set node candidate-priority
       pg_autoctl set candidate-priority - set candidate-priority property from the monitor

   Synopsis
       This command sets the pg_autoctl candidate priority for a given node:

          usage: pg_autoctl set node candidate-priority  [ --pgdata ] [ --json ] [ --formation ] [ --name ] <priority: 0..100>

          --pgdata      path to data directory
          --formation   pg_auto_failover formation
          --name        pg_auto_failover node name
          --json        output data in the JSON format

   Description
       See also pg_autoctl_show_settings for the full list of replication settings.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --json Output JSON formated data.

       --formation
              Show replication settings for given formation. Defaults to default.

       --name Show replication settings for given node, selected by name.

   Examples
          $ pg_autoctl set node candidate-priority --name node1 65
          12:47:59 92326 INFO  Waiting for the settings to have been applied to the monitor and primary node
          12:47:59 92326 INFO  New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
          12:47:59 92326 INFO  Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
          12:47:59 92326 INFO  New state is reported by node 1 "node1" (localhost:5501): "primary"
          65

          $ pg_autoctl set node candidate-priority --name node1 50 --json
          12:48:05 92450 INFO  Waiting for the settings to have been applied to the monitor and primary node
          12:48:05 92450 INFO  New state is reported by node 1 "node1" (localhost:5501): "apply_settings"
          12:48:05 92450 INFO  Setting goal state of node 1 "node1" (localhost:5501) to primary after it applied replication properties change.
          12:48:05 92450 INFO  New state is reported by node 1 "node1" (localhost:5501): "primary"
          {
              "candidate-priority": 50
          }

   pg_autoctl perform
       pg_autoctl perform - Perform an action orchestrated by the monitor

   pg_autoctl perform failover
       pg_autoctl perform failover - Perform a failover for given formation and group

   Synopsis
       This command starts a Postgres failover orchestration from the pg_auto_failover monitor:

          usage: pg_autoctl perform failover  [ --pgdata --formation --group ]

          --pgdata      path to data directory
          --formation   formation to target, defaults to 'default'
          --group       group to target, defaults to 0
          --wait        how many seconds to wait, default to 60

   Description
       The pg_auto_failover monitor can be used to orchestrate a manual failover,  sometimes  also  known  as  a
       switchover.  When  doing  so,  split-brain  are prevented thanks to intermediary states being used in the
       Finite State Machine.

       The pg_autoctl perform failover command waits until the failover is known complete  on  the  monitor,  or
       until the hard-coded 60s timeout has passed.

       The  failover  orchestration  is done in the background by the monitor, so even if the pg_autoctl perform
       failover stops on the timeout, the failover orchestration continues at the monitor.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --formation
              Formation to target for the operation. Defaults to default.

       --group
              Postgres group to target for the operation. Defaults to 0, only Citus  formations  may  have  more
              than one group.

       --wait How  many  seconds  to  wait  for  notifications  about  the promotion. The command stops when the
              promotion is finished (a node is primary), or when the timeout has elapsed, whichever comes first.
              The value 0 (zero) disables the timeout and allows the command to wait forever.

   Examples
          $ pg_autoctl perform failover
          12:57:30 3635 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          12:57:30 3635 INFO  Following table displays times when notifications are received
              Time |  Name |  Node |      Host:Port |       Current State |      Assigned State
          ---------+-------+-------+----------------+---------------------+--------------------
          12:57:30 | node1 |     1 | localhost:5501 |             primary |            draining
          12:57:30 | node1 |     1 | localhost:5501 |            draining |            draining
          12:57:30 | node2 |     2 | localhost:5502 |           secondary |          report_lsn
          12:57:30 | node3 |     3 | localhost:5503 |           secondary |          report_lsn
          12:57:36 | node3 |     3 | localhost:5503 |          report_lsn |          report_lsn
          12:57:36 | node2 |     2 | localhost:5502 |          report_lsn |          report_lsn
          12:57:36 | node2 |     2 | localhost:5502 |          report_lsn |   prepare_promotion
          12:57:36 | node2 |     2 | localhost:5502 |   prepare_promotion |   prepare_promotion
          12:57:36 | node2 |     2 | localhost:5502 |   prepare_promotion |    stop_replication
          12:57:36 | node1 |     1 | localhost:5501 |            draining |      demote_timeout
          12:57:36 | node3 |     3 | localhost:5503 |          report_lsn |      join_secondary
          12:57:36 | node1 |     1 | localhost:5501 |      demote_timeout |      demote_timeout
          12:57:36 | node3 |     3 | localhost:5503 |      join_secondary |      join_secondary
          12:57:37 | node2 |     2 | localhost:5502 |    stop_replication |    stop_replication
          12:57:37 | node2 |     2 | localhost:5502 |    stop_replication |        wait_primary
          12:57:37 | node1 |     1 | localhost:5501 |      demote_timeout |             demoted
          12:57:37 | node1 |     1 | localhost:5501 |             demoted |             demoted
          12:57:37 | node2 |     2 | localhost:5502 |        wait_primary |        wait_primary
          12:57:37 | node3 |     3 | localhost:5503 |      join_secondary |           secondary
          12:57:37 | node1 |     1 | localhost:5501 |             demoted |          catchingup
          12:57:38 | node3 |     3 | localhost:5503 |           secondary |           secondary
          12:57:38 | node2 |     2 | localhost:5502 |        wait_primary |             primary
          12:57:38 | node1 |     1 | localhost:5501 |          catchingup |          catchingup
          12:57:38 | node2 |     2 | localhost:5502 |             primary |             primary

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000F50 |    read-only |           secondary |           secondary
          node2 |     2 | localhost:5502 | 0/4000F50 |   read-write |             primary |             primary
          node3 |     3 | localhost:5503 | 0/4000F50 |    read-only |           secondary |           secondary

   pg_autoctl perform switchover
       pg_autoctl perform switchover - Perform a switchover for given formation and group

   Synopsis
       This command starts a Postgres switchover orchestration from the pg_auto_switchover monitor:

          usage: pg_autoctl perform switchover  [ --pgdata --formation --group ]

          --pgdata      path to data directory
          --formation   formation to target, defaults to 'default'
          --group       group to target, defaults to 0

   Description
       The pg_auto_switchover monitor can be used to orchestrate a manual switchover, sometimes also known as  a
       switchover.  When  doing  so,  split-brain  are prevented thanks to intermediary states being used in the
       Finite State Machine.

       The pg_autoctl perform switchover command waits until the switchover is known complete on the monitor, or
       until the hard-coded 60s timeout has passed.

       The switchover orchestration is done in the background by the monitor, so even if the pg_autoctl  perform
       switchover stops on the timeout, the switchover orchestration continues at the monitor.

       See also pg_autoctl_perform_failover, a synonym for this command.

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --formation
              Formation to target for the operation. Defaults to default.

       --group
              Postgres  group  to  target  for the operation. Defaults to 0, only Citus formations may have more
              than one group.

   pg_autoctl perform promotion
       pg_autoctl perform promotion - Perform a failover that promotes a target node

   Synopsis
       This command starts a Postgres failover orchestration from  the  pg_auto_promotion  monitor  and  targets
       given node:

          usage: pg_autoctl perform promotion  [ --pgdata --formation --group ]

          --pgdata      path to data directory
          --formation   formation to target, defaults to 'default'
          --name        node name to target, defaults to current node
          --wait        how many seconds to wait, default to 60

   Description
       The  pg_auto_promotion  monitor  can be used to orchestrate a manual promotion, sometimes also known as a
       switchover. When doing so, split-brain are prevented thanks to intermediary  states  being  used  in  the
       Finite State Machine.

       The  pg_autoctl  perform promotion command waits until the promotion is known complete on the monitor, or
       until the hard-coded 60s timeout has passed.

       The promotion orchestration is done in the background by the monitor, so even if the  pg_autoctl  perform
       promotion stops on the timeout, the promotion orchestration continues at the monitor.

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --formation
              Formation to target for the operation. Defaults to default.

       --name Name of the node that should be elected as the new primary node.

       --wait How  many  seconds  to  wait  for  notifications  about  the promotion. The command stops when the
              promotion is finished (a node is primary), or when the timeout has elapsed, whichever comes first.
              The value 0 (zero) disables the timeout and allows the command to wait forever.

   Examples
          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/4000F88 |    read-only |           secondary |           secondary
          node2 |     2 | localhost:5502 | 0/4000F88 |   read-write |             primary |             primary
          node3 |     3 | localhost:5503 | 0/4000F88 |    read-only |           secondary |           secondary

          $ pg_autoctl perform promotion --name node1
          13:08:13 15297 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          13:08:13 15297 INFO  Following table displays times when notifications are received
              Time |  Name |  Node |      Host:Port |       Current State |      Assigned State
          ---------+-------+-------+----------------+---------------------+--------------------
          13:08:13 | node1 |   0/1 | localhost:5501 |           secondary |           secondary
          13:08:13 | node2 |   0/2 | localhost:5502 |             primary |            draining
          13:08:13 | node2 |   0/2 | localhost:5502 |            draining |            draining
          13:08:13 | node1 |   0/1 | localhost:5501 |           secondary |          report_lsn
          13:08:13 | node3 |   0/3 | localhost:5503 |           secondary |          report_lsn
          13:08:19 | node3 |   0/3 | localhost:5503 |          report_lsn |          report_lsn
          13:08:19 | node1 |   0/1 | localhost:5501 |          report_lsn |          report_lsn
          13:08:19 | node1 |   0/1 | localhost:5501 |          report_lsn |   prepare_promotion
          13:08:19 | node1 |   0/1 | localhost:5501 |   prepare_promotion |   prepare_promotion
          13:08:19 | node1 |   0/1 | localhost:5501 |   prepare_promotion |    stop_replication
          13:08:19 | node2 |   0/2 | localhost:5502 |            draining |      demote_timeout
          13:08:19 | node3 |   0/3 | localhost:5503 |          report_lsn |      join_secondary
          13:08:19 | node2 |   0/2 | localhost:5502 |      demote_timeout |      demote_timeout
          13:08:19 | node3 |   0/3 | localhost:5503 |      join_secondary |      join_secondary
          13:08:20 | node1 |   0/1 | localhost:5501 |    stop_replication |    stop_replication
          13:08:20 | node1 |   0/1 | localhost:5501 |    stop_replication |        wait_primary
          13:08:20 | node2 |   0/2 | localhost:5502 |      demote_timeout |             demoted
          13:08:20 | node1 |   0/1 | localhost:5501 |        wait_primary |        wait_primary
          13:08:20 | node3 |   0/3 | localhost:5503 |      join_secondary |           secondary
          13:08:20 | node2 |   0/2 | localhost:5502 |             demoted |             demoted
          13:08:20 | node2 |   0/2 | localhost:5502 |             demoted |          catchingup
          13:08:21 | node3 |   0/3 | localhost:5503 |           secondary |           secondary
          13:08:21 | node1 |   0/1 | localhost:5501 |        wait_primary |             primary
          13:08:21 | node2 |   0/2 | localhost:5502 |          catchingup |          catchingup
          13:08:21 | node1 |   0/1 | localhost:5501 |             primary |             primary

          $ pg_autoctl show state
           Name |  Node |      Host:Port |       LSN |   Connection |       Current State |      Assigned State
          ------+-------+----------------+-----------+--------------+---------------------+--------------------
          node1 |     1 | localhost:5501 | 0/40012F0 |   read-write |             primary |             primary
          node2 |     2 | localhost:5502 | 0/40012F0 |    read-only |           secondary |           secondary
          node3 |     3 | localhost:5503 | 0/40012F0 |    read-only |           secondary |           secondary

   pg_autoctl do
       pg_autoctl do - Internal commands and internal QA tooling

       The debug commands for pg_autoctl are only available when the environment  variable  PG_AUTOCTL_DEBUG  is
       set (to any value).

       When  testing  pg_auto_failover,  it  is  helpful  to be able to play with the local nodes using the same
       lower-level API as used by the pg_auto_failover Finite State Machine transitions. Some commands could  be
       useful  in  contexts  other than pg_auto_failover development and QA work, so some documentation has been
       made available.

   pg_autoctl do tmux
       pg_autoctl do tmux - Set of facilities to handle tmux interactive sessions

   Synopsis
       pg_autoctl do tmux provides the following commands:

          pg_autoctl do tmux
           script   Produce a tmux script for a demo or a test case (debug only)
           session  Run a tmux session for a demo or a test case
           stop     Stop pg_autoctl processes that belong to a tmux session
           wait     Wait until a given node has been registered on the monitor
           clean    Clean-up a tmux session processes and root dir

   Description
       An easy way to get started with pg_auto_failover in a localhost only formation with three nodes is to run
       the following command:

          $ PG_AUTOCTL_DEBUG=1 pg_autoctl do tmux session \
               --root /tmp/pgaf \
                   --first-pgport 9000 \
                   --nodes 4 \
                   --layout tiled

       This requires the command tmux to be available in your PATH. The  pg_autoctl  do  tmux  session  commands
       prepares  a self-contained root directory where to create pg_auto_failover nodes and their configuration,
       then prepares a tmux script, and then runs the script with a command such as:

          /usr/local/bin/tmux -v start-server ; source-file /tmp/pgaf/script-9000.tmux

       The tmux session contains a single tmux window multiple panes:

          • one pane for the monitor

          • one pane per Postgres nodes, here 4 of them

          • one pane for running watch pg_autoctl show state

          • one extra pane for an interactive shell.

       Usually the first two commands to run in the interactive shell, once the formation is stable (one node is
       primary, the other ones are all secondary), are the following:

          $ pg_autoctl get formation settings
          $ pg_autoctl perform failover

   pg_autoctl do demo
       pg_autoctl do demo - Use a demo application for pg_auto_failover

   Synopsis
       pg_autoctl do demo provides the following commands:

          pg_autoctl do demo
           run      Run the pg_auto_failover demo application
           uri      Grab the application connection string from the monitor
           ping     Attempt to connect to the application URI
           summary  Display a summary of the previous demo app run

       To run a demo, use pg_autoctl do demo run:

          usage: pg_autoctl do demo run [option ...]

          --monitor        Postgres URI of the pg_auto_failover monitor
          --formation      Formation to use (default)
          --group          Group Id to failover (0)
          --username       PostgreSQL's username
          --clients        How many client processes to use (1)
          --duration       Duration of the demo app, in seconds (30)
          --first-failover Timing of the first failover (10)
          --failover-freq  Seconds between subsequent failovers (45)

   Description
       The pg_autoctl debug tooling includes a demo application.

       The demo prepare its Postgres schema on the  target  database,  and  then  starts  several  clients  (see
       --clients)  that  concurrently  connect  to  the  target  application  URI and record the time it took to
       establish the Postgres connection to the current read-write node, with information about the retry policy
       metrics.

   Example
          $ pg_autoctl do demo run --monitor 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer' --clients 10
          14:43:35 19660 INFO  Using application connection string "postgres://localhost:5502,localhost:5503,localhost:5501/demo?target_session_attrs=read-write&sslmode=prefer"
          14:43:35 19660 INFO  Using Postgres user PGUSER "dim"
          14:43:35 19660 INFO  Preparing demo schema: drop schema if exists demo cascade
          14:43:35 19660 WARN  NOTICE:  schema "demo" does not exist, skipping
          14:43:35 19660 INFO  Preparing demo schema: create schema demo
          14:43:35 19660 INFO  Preparing demo schema: create table demo.tracking(ts timestamptz default now(), client integer, loop integer, retries integer, us bigint, recovery bool)
          14:43:36 19660 INFO  Preparing demo schema: create table demo.client(client integer, pid integer, retry_sleep_ms integer, retry_cap_ms integer, failover_count integer)
          14:43:36 19660 INFO  Starting 10 concurrent clients as sub-processes
          14:43:36 19675 INFO  Failover client is started, will failover in 10s and every 45s after that
          ...

          $ pg_autoctl do demo summary --monitor 'postgres://autoctl_node@localhost:5500/pg_auto_failover?sslmode=prefer' --clients 10
          14:44:27 22789 INFO  Using application connection string "postgres://localhost:5503,localhost:5501,localhost:5502/demo?target_session_attrs=read-write&sslmode=prefer"
          14:44:27 22789 INFO  Using Postgres user PGUSER "dim"
          14:44:27 22789 INFO  Summary for the demo app running with 10 clients for 30s
                  Client        | Connections | Retries | Min Connect Time (ms) |   max    |   p95   |   p99
          ----------------------+-------------+---------+-----------------------+----------+---------+---------
           Client 1             |         136 |      14 |                58.318 | 2601.165 | 244.443 | 261.809
           Client 2             |         136 |       5 |                55.199 | 2514.968 | 242.362 | 259.282
           Client 3             |         134 |       6 |                55.815 | 2974.247 | 241.740 | 262.908
           Client 4             |         135 |       7 |                56.542 | 2970.922 | 238.995 | 251.177
           Client 5             |         136 |       8 |                58.339 | 2758.106 | 238.720 | 252.439
           Client 6             |         134 |       9 |                58.679 | 2813.653 | 244.696 | 254.674
           Client 7             |         134 |      11 |                58.737 | 2795.974 | 243.202 | 253.745
           Client 8             |         136 |      12 |                52.109 | 2354.952 | 242.664 | 254.233
           Client 9             |         137 |      19 |                59.735 | 2628.496 | 235.668 | 253.582
           Client 10            |         133 |       6 |                57.994 | 3060.489 | 242.156 | 256.085
           All Clients Combined |        1351 |      97 |                52.109 | 3060.489 | 241.848 | 258.450
          (11 rows)

           Min Connect Time (ms) |   max    | freq |                      bar
          -----------------------+----------+------+-----------------------------------------------
                          52.109 |  219.105 | 1093 | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
                         219.515 |  267.168 |  248 | ▒▒▒▒▒▒▒▒▒▒
                        2354.952 | 2354.952 |    1 |
                        2514.968 | 2514.968 |    1 |
                        2601.165 | 2628.496 |    2 |
                        2758.106 | 2813.653 |    3 |
                        2970.922 | 2974.247 |    2 |
                        3060.489 | 3060.489 |    1 |
          (8 rows)

   pg_autoctl do service restart
       pg_autoctl do service restart - Run pg_autoctl sub-processes (services)

   Synopsis
       pg_autoctl do service restart provides the following commands:

          pg_autoctl do service restart
           postgres     Restart the pg_autoctl postgres controller service
           listener     Restart the pg_autoctl monitor listener service
           node-active  Restart the pg_autoctl keeper node-active service

   Description
       It is possible to restart the pg_autoctl or the Postgres service  without  affecting  the  other  running
       service. Typically, to restart the pg_autoctl parts without impacting Postgres:

          $ pg_autoctl do service restart node-active --pgdata node1
          14:52:06 31223 INFO  Sending the TERM signal to service "node-active" with pid 26626
          14:52:06 31223 INFO  Service "node-active" has been restarted with pid 31230
          31230

       The Postgres service has not been impacted by the restart of the pg_autoctl process.

   pg_autoctl do show
       pg_autoctl do show - Show some debug level information

   Synopsis
       The  commands  pg_autoctl_create_monitor  and  pg_autoctl_create_postgres  both  implement  some level of
       automated detection of the node network settings when the option --hostname is not used.

       Adding to those commands, when a new node is registered to the  monitor,  other  nodes  also  edit  their
       Postgres HBA rules to allow the new node to connect, unless the option --skip-pg-hba has been used.

       The debug sub-commands for pg_autoctl do show can be used to see in details the network discovery done by
       pg_autoctl.

       pg_autoctl do show provides the following commands:

          pg_autoctl do show
            ipaddr    Print this node's IP address information
            cidr      Print this node's CIDR information
            lookup    Print this node's DNS lookup information
            hostname  Print this node's default hostname
            reverse   Lookup given hostname and check reverse DNS setup

   pg_autoctl do show ipaddr
       Connects  to  an external IP address and uses getsockname(2) to retrieve the current address to which the
       socket is bound.

       The external IP address defaults to 8.8.8.8, the IP address of a Google provided public DNS server, or to
       the monitor IP address or hostname in the context of pg_autoctl_create_postgres.

          $ pg_autoctl do show ipaddr
          16:42:40 62631 INFO  ipaddr.c:107: Connecting to 8.8.8.8 (port 53)
          192.168.1.156

   pg_autoctl do show cidr
       Connects to an external IP address in the same way as the previous command pg_autoctl do show ipaddr  and
       then matches the local socket name with the list of local network interfaces. When a match is found, uses
       the netmask of the interface to compute the CIDR notation from the IP address.

       The computed CIDR notation is then used in HBA rules.

          $ pg_autoctl do show cidr
          16:43:19 63319 INFO  Connecting to 8.8.8.8 (port 53)
          192.168.1.0/24

   pg_autoctl do show hostname
       Uses  either  its  first (and only) argument or the result of gethostname(2) as the candidate hostname to
       use in HBA rules, and then check that the hostname resolves to an IP address that belongs to one  of  the
       machine network interfaces.

       When  the  hostname  forward-dns  lookup  resolves  to  an IP address that is local to the node where the
       command is run, then a reverse-lookup from the IP address is made to see if it matches with the candidate
       hostname.

          $ pg_autoctl do show hostname
          DESKTOP-IC01GOOS.europe.corp.microsoft.com

          $ pg_autoctl -vv do show hostname 'postgres://autoctl_node@localhost:5500/pg_auto_failover'
          13:45:00 93122 INFO  cli_do_show.c:256: Using monitor hostname "localhost" and port 5500
          13:45:00 93122 INFO  ipaddr.c:107: Connecting to ::1 (port 5500)
          13:45:00 93122 DEBUG cli_do_show.c:272: cli_show_hostname: ip ::1
          13:45:00 93122 DEBUG cli_do_show.c:283: cli_show_hostname: host localhost
          13:45:00 93122 DEBUG cli_do_show.c:294: cli_show_hostname: ip ::1
          localhost

   pg_autoctl do show lookup
       Checks that the given argument is an hostname that resolves to a local IP address, that is an IP  address
       associated with a local network interface.

          $ pg_autoctl do show lookup DESKTOP-IC01GOOS.europe.corp.microsoft.com
          DESKTOP-IC01GOOS.europe.corp.microsoft.com: 192.168.1.156

   pg_autoctl do show reverse
       Implements  the  same  DNS  checks  as Postgres HBA matching code: first does a forward DNS lookup of the
       given hostname, and then a reverse-lookup from all the IP addresses obtained. Success is reached when  at
       least  one  of the IP addresses from the forward lookup resolves back to the given hostname (as the first
       answer to the reverse DNS lookup).

          $ pg_autoctl do show reverse DESKTOP-IC01GOOS.europe.corp.microsoft.com
          16:44:49 64910 FATAL Failed to find an IP address for hostname "DESKTOP-IC01GOOS.europe.corp.microsoft.com" that matches hostname again in a reverse-DNS lookup.
          16:44:49 64910 INFO  Continuing with IP address "192.168.1.156"

          $ pg_autoctl -vv do show reverse DESKTOP-IC01GOOS.europe.corp.microsoft.com
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 192.168.1.156
          16:44:45 64832 DEBUG ipaddr.c:733: reverse lookup for "192.168.1.156" gives "desktop-ic01goos.europe.corp.microsoft.com" first
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 192.168.1.156
          16:44:45 64832 DEBUG ipaddr.c:733: reverse lookup for "192.168.1.156" gives "desktop-ic01goos.europe.corp.microsoft.com" first
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 2a01:110:10:40c::2ad
          16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 2a01:110:10:40c::2ad
          16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 100.64.34.213
          16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
          16:44:45 64832 DEBUG ipaddr.c:719: DESKTOP-IC01GOOS.europe.corp.microsoft.com has address 100.64.34.213
          16:44:45 64832 DEBUG ipaddr.c:728: Failed to resolve hostname from address "192.168.1.156": nodename nor servname provided, or not known
          16:44:45 64832 FATAL cli_do_show.c:333: Failed to find an IP address for hostname "DESKTOP-IC01GOOS.europe.corp.microsoft.com" that matches hostname again in a reverse-DNS lookup.
          16:44:45 64832 INFO  cli_do_show.c:334: Continuing with IP address "192.168.1.156"

   pg_autoctl do pgsetup
       pg_autoctl do pgsetup - Manage a local Postgres setup

   Synopsis
       The main pg_autoctl commands implement low-level management tooling for a local Postgres  instance.  Some
       of the low-level Postgres commands can be used as their own tool in some cases.

       pg_autoctl do pgsetup provides the following commands:

          pg_autoctl do pgsetup
            pg_ctl    Find a non-ambiguous pg_ctl program and Postgres version
            discover  Discover local PostgreSQL instance, if any
            ready     Return true is the local Postgres server is ready
            wait      Wait until the local Postgres server is ready
            logs      Outputs the Postgres startup logs
            tune      Compute and log some Postgres tuning options

   pg_autoctl do pgsetup pg_ctl
       In  a similar way to which -a, this commands scans your PATH for pg_ctl commands. Then it runs the pg_ctl
       --version command and parses the output to determine the version of Postgres that  is  available  in  the
       path.

          $ pg_autoctl do pgsetup pg_ctl --pgdata node1
          16:49:18 69684 INFO  Environment variable PG_CONFIG is set to "/Applications/Postgres.app//Contents/Versions/12/bin/pg_config"
          16:49:18 69684 INFO  `pg_autoctl create postgres` would use "/Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl" for Postgres 12.3
          16:49:18 69684 INFO  `pg_autoctl create monitor` would use "/Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl" for Postgres 12.3

   pg_autoctl do pgsetup discover
       Given  a  PGDATA  or  --pgdata  option,  the  command discovers if a running Postgres service matches the
       pg_autoctl setup, and prints the information that pg_autoctl typically needs  when  managing  a  Postgres
       instance.

          $ pg_autoctl do pgsetup discover --pgdata node1
          pgdata:                /Users/dim/dev/MS/pg_auto_failover/tmux/node1
          pg_ctl:                /Applications/Postgres.app/Contents/Versions/12/bin/pg_ctl
          pg_version:            12.3
          pghost:                /tmp
          pgport:                5501
          proxyport:             0
          pid:                   21029
          is in recovery:        no
          Control Version:       1201
          Catalog Version:       201909212
          System Identifier:     6942422768095393833
          Latest checkpoint LSN: 0/4059C18
          Postmaster status:     ready

   pg_autoctl do pgsetup ready
       Similar  to  the pg_isready command, though uses the Postgres specifications found in the pg_autoctl node
       setup.

          $ pg_autoctl do pgsetup ready --pgdata node1
          16:50:08 70582 INFO  Postgres status is: "ready"

   pg_autoctl do pgsetup wait
       When pg_autoctl do pgsetup ready would return false because Postgres  is  not  ready  yet,  this  command
       continues probing every second for 30 seconds, and exists as soon as Postgres is ready.

          $ pg_autoctl do pgsetup wait --pgdata node1
          16:50:22 70829 INFO  Postgres is now serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 21029
          16:50:22 70829 INFO  Postgres status is: "ready"

   pg_autoctl do pgsetup logs
       Outputs the Postgres logs from the most recent log file in the PGDATA/log directory.

          $ pg_autoctl do pgsetup logs --pgdata node1
          16:50:39 71126 WARN  Postgres logs from "/Users/dim/dev/MS/pg_auto_failover/tmux/node1/startup.log":
          16:50:39 71126 INFO  2021-03-22 14:43:48.911 CET [21029] LOG:  starting PostgreSQL 12.3 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
          16:50:39 71126 INFO  2021-03-22 14:43:48.913 CET [21029] LOG:  listening on IPv6 address "::", port 5501
          16:50:39 71126 INFO  2021-03-22 14:43:48.913 CET [21029] LOG:  listening on IPv4 address "0.0.0.0", port 5501
          16:50:39 71126 INFO  2021-03-22 14:43:48.913 CET [21029] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5501"
          16:50:39 71126 INFO  2021-03-22 14:43:48.931 CET [21029] LOG:  redirecting log output to logging collector process
          16:50:39 71126 INFO  2021-03-22 14:43:48.931 CET [21029] HINT:  Future log output will appear in directory "log".
          16:50:39 71126 WARN  Postgres logs from "/Users/dim/dev/MS/pg_auto_failover/tmux/node1/log/postgresql-2021-03-22_144348.log":
          16:50:39 71126 INFO  2021-03-22 14:43:48.937 CET [21033] LOG:  database system was shut down at 2021-03-22 14:43:46 CET
          16:50:39 71126 INFO  2021-03-22 14:43:48.937 CET [21033] LOG:  entering standby mode
          16:50:39 71126 INFO  2021-03-22 14:43:48.942 CET [21033] LOG:  consistent recovery state reached at 0/4022E88
          16:50:39 71126 INFO  2021-03-22 14:43:48.942 CET [21033] LOG:  invalid record length at 0/4022E88: wanted 24, got 0
          16:50:39 71126 INFO  2021-03-22 14:43:48.946 CET [21029] LOG:  database system is ready to accept read only connections
          16:50:39 71126 INFO  2021-03-22 14:43:49.032 CET [21038] LOG:  fetching timeline history file for timeline 4 from primary server
          16:50:39 71126 INFO  2021-03-22 14:43:49.037 CET [21038] LOG:  started streaming WAL from primary at 0/4000000 on timeline 3
          16:50:39 71126 INFO  2021-03-22 14:43:49.046 CET [21038] LOG:  replication terminated by primary server
          16:50:39 71126 INFO  2021-03-22 14:43:49.046 CET [21038] DETAIL:  End of WAL reached on timeline 3 at 0/4022E88.
          16:50:39 71126 INFO  2021-03-22 14:43:49.047 CET [21033] LOG:  new target timeline is 4
          16:50:39 71126 INFO  2021-03-22 14:43:49.049 CET [21038] LOG:  restarted WAL streaming at 0/4000000 on timeline 4
          16:50:39 71126 INFO  2021-03-22 14:43:49.210 CET [21033] LOG:  redo starts at 0/4022E88
          16:50:39 71126 INFO  2021-03-22 14:52:06.692 CET [21029] LOG:  received SIGHUP, reloading configuration files
          16:50:39 71126 INFO  2021-03-22 14:52:06.906 CET [21029] LOG:  received SIGHUP, reloading configuration files
          16:50:39 71126 FATAL 2021-03-22 15:34:24.920 CET [21038] FATAL:  terminating walreceiver due to timeout
          16:50:39 71126 INFO  2021-03-22 15:34:24.973 CET [21033] LOG:  invalid record length at 0/4059CC8: wanted 24, got 0
          16:50:39 71126 INFO  2021-03-22 15:34:25.105 CET [35801] LOG:  started streaming WAL from primary at 0/4000000 on timeline 4
          16:50:39 71126 FATAL 2021-03-22 16:12:56.918 CET [35801] FATAL:  terminating walreceiver due to timeout
          16:50:39 71126 INFO  2021-03-22 16:12:57.086 CET [38741] LOG:  started streaming WAL from primary at 0/4000000 on timeline 4
          16:50:39 71126 FATAL 2021-03-22 16:23:39.349 CET [38741] FATAL:  terminating walreceiver due to timeout
          16:50:39 71126 INFO  2021-03-22 16:23:39.497 CET [41635] LOG:  started streaming WAL from primary at 0/4000000 on timeline 4

   pg_autoctl do pgsetup tune
       Outputs  the  pg_autoclt  automated  tuning  options.  Depending  on  the number of CPU and amount of RAM
       detected in the environment where it is run, pg_autoctl can adjust some very basic Postgres tuning  knobs
       to get started.

          $ pg_autoctl do pgsetup tune --pgdata node1 -vv
          13:25:25 77185 DEBUG pgtuning.c:85: Detected 12 CPUs and 16 GB total RAM on this server
          13:25:25 77185 DEBUG pgtuning.c:225: Setting autovacuum_max_workers to 3
          13:25:25 77185 DEBUG pgtuning.c:228: Setting shared_buffers to 4096 MB
          13:25:25 77185 DEBUG pgtuning.c:231: Setting work_mem to 24 MB
          13:25:25 77185 DEBUG pgtuning.c:235: Setting maintenance_work_mem to 512 MB
          13:25:25 77185 DEBUG pgtuning.c:239: Setting effective_cache_size to 12 GB
          # basic tuning computed by pg_auto_failover
          track_functions = pl
          shared_buffers = '4096 MB'
          work_mem = '24 MB'
          maintenance_work_mem = '512 MB'
          effective_cache_size = '12 GB'
          autovacuum_max_workers = 3
          autovacuum_vacuum_scale_factor = 0.08
          autovacuum_analyze_scale_factor = 0.02

       The low-level API is made available through the following pg_autoctl do commands, only available in debug
       environments:

          pg_autoctl do
          + monitor  Query a pg_auto_failover monitor
          + fsm      Manually manage the keeper's state
          + primary  Manage a PostgreSQL primary server
          + standby  Manage a PostgreSQL standby server
          + show     Show some debug level information
          + pgsetup  Manage a local Postgres setup
          + pgctl    Signal the pg_autoctl postgres service
          + service  Run pg_autoctl sub-processes (services)
          + tmux     Set of facilities to handle tmux interactive sessions
          + azure    Manage a set of Azure resources for a pg_auto_failover demo
          + demo     Use a demo application for pg_auto_failover

          pg_autoctl do monitor
          + get                 Get information from the monitor
            register            Register the current node with the monitor
            active              Call in the pg_auto_failover Node Active protocol
            version             Check that monitor version is 1.5.0.1; alter extension update if not
            parse-notification  parse a raw notification message

          pg_autoctl do monitor get
            primary      Get the primary node from pg_auto_failover in given formation/group
            others       Get the other nodes from the pg_auto_failover group of hostname/port
            coordinator  Get the coordinator node from the pg_auto_failover formation

          pg_autoctl do fsm
            init    Initialize the keeper's state on-disk
            state   Read the keeper's state from disk and display it
            list    List reachable FSM states from current state
            gv      Output the FSM as a .gv program suitable for graphviz/dot
            assign  Assign a new goal state to the keeper
            step    Make a state transition if instructed by the monitor
          + nodes   Manually manage the keeper's nodes list

          pg_autoctl do fsm nodes
            get  Get the list of nodes from file (see --disable-monitor)
            set  Set the list of nodes to file (see --disable-monitor)

          pg_autoctl do primary
          + slot      Manage replication slot on the primary server
          + adduser   Create users on primary
            defaults  Add default settings to postgresql.conf
            identify  Run the IDENTIFY_SYSTEM replication command on given host

          pg_autoctl do primary slot
            create  Create a replication slot on the primary server
            drop    Drop a replication slot on the primary server

          pg_autoctl do primary adduser
            monitor  add a local user for queries from the monitor
            replica  add a local user with replication privileges

          pg_autoctl do standby
            init     Initialize the standby server using pg_basebackup
            rewind   Rewind a demoted primary server using pg_rewind
            promote  Promote a standby server to become writable

          pg_autoctl do show
            ipaddr    Print this node's IP address information
            cidr      Print this node's CIDR information
            lookup    Print this node's DNS lookup information
            hostname  Print this node's default hostname
            reverse   Lookup given hostname and check reverse DNS setup

          pg_autoctl do pgsetup
            pg_ctl    Find a non-ambiguous pg_ctl program and Postgres version
            discover  Discover local PostgreSQL instance, if any
            ready     Return true is the local Postgres server is ready
            wait      Wait until the local Postgres server is ready
            logs      Outputs the Postgres startup logs
            tune      Compute and log some Postgres tuning options

          pg_autoctl do pgctl
            on   Signal pg_autoctl postgres service to ensure Postgres is running
            off  Signal pg_autoctl postgres service to ensure Postgres is stopped

          pg_autoctl do service
          + getpid        Get the pid of pg_autoctl sub-processes (services)
          + restart       Restart pg_autoctl sub-processes (services)
            pgcontroller  pg_autoctl supervised postgres controller
            postgres      pg_autoctl service that start/stop postgres when asked
            listener      pg_autoctl service that listens to the monitor notifications
            node-active   pg_autoctl service that implements the node active protocol

          pg_autoctl do service getpid
            postgres     Get the pid of the pg_autoctl postgres controller service
            listener     Get the pid of the pg_autoctl monitor listener service
            node-active  Get the pid of the pg_autoctl keeper node-active service

          pg_autoctl do service restart
            postgres     Restart the pg_autoctl postgres controller service
            listener     Restart the pg_autoctl monitor listener service
            node-active  Restart the pg_autoctl keeper node-active service

          pg_autoctl do tmux
            script   Produce a tmux script for a demo or a test case (debug only)
            session  Run a tmux session for a demo or a test case
            stop     Stop pg_autoctl processes that belong to a tmux session
            wait     Wait until a given node has been registered on the monitor
            clean    Clean-up a tmux session processes and root dir

          pg_autoctl do azure
          + provision  provision azure resources for a pg_auto_failover demo
          + tmux       Run a tmux session with an Azure setup for QA/testing
          + show       show azure resources for a pg_auto_failover demo
            deploy     Deploy a pg_autoctl VMs, given by name
            create     Create an azure QA environment
            drop       Drop an azure QA environment: resource group, network, VMs
            ls         List resources in a given azure region
            ssh        Runs ssh -l ha-admin <public ip address> for a given VM name
            sync       Rsync pg_auto_failover sources on all the target region VMs

          pg_autoctl do azure provision
            region  Provision an azure region: resource group, network, VMs
            nodes   Provision our pre-created VM with pg_autoctl Postgres nodes

          pg_autoctl do azure tmux
            session  Create or attach a tmux session for the created Azure VMs
            kill     Kill an existing tmux session for Azure VMs

          pg_autoctl do azure show
            ips    Show public and private IP addresses for selected VMs
            state  Connect to the monitor node to show the current state

          pg_autoctl do demo
            run      Run the pg_auto_failover demo application
            uri      Grab the application connection string from the monitor
            ping     Attempt to connect to the application URI
            summary  Display a summary of the previous demo app run

   pg_autoctl run
       pg_autoctl run - Run the pg_autoctl service (monitor or keeper)

   Synopsis
       This  commands  starts  the  processes  needed  to  run a monitor node or a keeper node, depending on the
       configuration file that belongs to the --pgdata option or PGDATA environment variable.

          usage: pg_autoctl run  [ --pgdata --name --hostname --pgport ]

          --pgdata      path to data directory
          --name        pg_auto_failover node name
          --hostname    hostname used to connect from other nodes
          --pgport      PostgreSQL's port number

   Description
       When registering Postgres nodes to the  pg_auto_failover  monitor  using  the  pg_autoctl_create_postgres
       command, the nodes are registered with metadata: the node name, hostname and Postgres port.

       The  node  name  is  used  mostly  in  the  logs  and  pg_autoctl_show_state  commands  and  helps  human
       administrators of the formation.

       The node hostname and pgport are used by other nodes, including the pg_auto_failover monitor, to  open  a
       Postgres connection.

       Both  the  node  name  and the node hostname and port can be changed after the node registration by using
       either this command (pg_autoctl run) or the pg_autoctl_config_set command.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --name Node name used on the monitor to refer to this node. The hostname is a technical information,  and
              given  Postgres  requirements  on  the  HBA  setup  and  DNS  resolution (both forward and reverse
              lookups), IP addresses are often used for the hostname.

              The --name option allows using a user-friendly name for your Postgres nodes.

       --hostname
              Hostname or IP address (both v4 and v6 are supported) to use from any other  node  to  connect  to
              this node.

              When not provided, a default value is computed by running the following algorithm.

                 1. We  get  this machine's "public IP" by opening a connection to the given monitor hostname or
                    IP address. Then we get TCP/IP client address that has been used to make that connection.

                 2. We then do a reverse DNS lookup on the IP address found in the  previous  step  to  fetch  a
                    hostname for our local machine.

                 3. If  the reverse DNS lookup is successful , then pg_autoctl does a forward DNS lookup of that
                    hostname.

              When the forward DNS lookup response in step 3. is an IP address found in one of our local network
              interfaces, then pg_autoctl uses the  hostname  found  in  step  2.  as  the  default  --hostname.
              Otherwise it uses the IP address found in step 1.

              You  may  use  the --hostname command line option to bypass the whole DNS lookup based process and
              force the local node name to a fixed value.

       --pgport
              Postgres port to use, defaults to 5432.

   pg_autoctl watch
       pg_autoctl watch - Display an auto-updating dashboard

   Synopsis
       This command outputs the events that the pg_auto_failover events  records  about  state  changes  of  the
       pg_auto_failover nodes managed by the monitor:

          usage: pg_autoctl watch  [ --pgdata --formation --group ]

          --pgdata      path to data directory
          --monitor     show the monitor uri
          --formation   formation to query, defaults to 'default'
          --group       group to query formation, defaults to all
          --json        output data in the JSON format

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --monitor
              Postgres  URI  used  to  connect to the monitor. Must use the autoctl_node username and target the
              pg_auto_failover database name. It is possible to show the Postgres  URI  from  the  monitor  node
              using the command pg_autoctl_show_uri.

       --formation
              List the events recorded for nodes in the given formation. Defaults to default.

       --group
              Limit output to a single group in the formation. Default to including all groups registered in the
              target formation.

   Description
       The pg_autoctl watch output is divided in 3 sections.

       The  first  section  is a single header line which includes the name of the currently selected formation,
       the formation replication setting number_sync_standbys, and then in the right most position  the  current
       time.

       The second section displays one line per node, and each line contains a list of columns that describe the
       current  state  for the node. This list can includes the following columns, and which columns are part of
       the output depends on the terminal window size. This choice is  dynamic  and  changes  if  your  terminal
       window size changes:

          • Name
                Name of the node.

          • Node, or Id
                Node  information.  When  the  formation  has a single group (group zero), then this column only
                contains the nodeId.

                Only Citus formations allow several groups.  When  using  a  Citus  formation  the  Node  column
                contains the groupId and the nodeId, separated by a colon, such as 0:1 for the first coordinator
                node.

          • Last Report, or Report
                Time interval between now and the last known time when a node has reported to the monitor, using
                the node_active protocol.

                This  value  is  expected  to stay under 2s or abouts, and is known to increment when either the
                pg_autoctl run service is not running, or when there is a network split.

          • Last Check, or Check
                Time inverval between now and the last known time when the monitor could  connect  to  a  node's
                Postgres instance, via its health check mechanism.

                This  value  is known to increment when either the Postgres service is not running on the target
                node, when there is a network split, or when the internal machinery  (the  health  check  worker
                background process) implements jitter.

          • Host:Port
                Hostname and port number used to connect to the node.

          • TLI: LSN
                Timeline identifier (TLI) and Postgres Log Sequence Number (LSN).

                The  LSN  is  the current position in the Postgres WAL stream. This is a hexadecimal number. See
                pg_lsn for more information.

                The current timeline is incremented each time a failover happens, or when doing  Point  In  Time
                Recovery.  A  node  can  only  reach  the secondary state when it is on the same timeline as its
                primary node.

          • Connection
                This output field contains two bits of information. First, the Postgres connection type that the
                node provides, either read-write or read-only. Then the mark ! is added  when  the  monitor  has
                failed to connect to this node, and ? when the monitor didn't connect to the node yet.

          • Reported State
                The  current  FSM  state  as  reported  to  the monitor by the pg_autoctl process running on the
                Postgres node.

          • Assigned State
                The assigned FSM state on the monitor. When the assigned state is not the same as  the  reported
                start,  then  the  pg_autoctl  process running on the Postgres node might have not retrieved the
                assigned state yet, or might still be implementing the FSM transition from the current state  to
                the assigned state.

       The  third and last section lists the most recent events that the monitor has registered, the more recent
       event is found at the bottom of the screen.

       To quit the command hit either the F1 key or the q key.

   pg_autoctl stop
       pg_autoctl stop - signal the pg_autoctl service for it to stop

   Synopsis
       This commands stops the processes needed to run a monitor  node  or  a  keeper  node,  depending  on  the
       configuration file that belongs to the --pgdata option or PGDATA environment variable.

          usage: pg_autoctl stop  [ --pgdata --fast --immediate ]

          --pgdata      path to data directory
          --fast        fast shutdown mode for the keeper
          --immediate   immediate shutdown mode for the keeper

   Description
       The  pg_autoctl  stop  commands  finds  the PID of the running service for the given --pgdata, and if the
       process is still running, sends a SIGTERM signal to the process.

       When pg_autoclt receives a shutdown signal a shutdown sequence is  triggered.  Depending  on  the  signal
       received,  an  operation  that has been started (such as a state transition) is either run to completion,
       stopped as the next opportunity, or stopped immediately even when in the middle of the transition.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

       --fast Fast Shutdown mode for pg_autoctl. Sends the SIGINT signal to the running service,  which  is  the
              same as using C-c on an interactive process running as a foreground shell job.

       --immediate
              Immediate Shutdown mode for pg_autoctl. Sends the SIGQUIT signal to the running service.

   pg_autoctl reload
       pg_autoctl reload - signal the pg_autoctl for it to reload its configuration

   Synopsis
       This commands signals a running pg_autoctl process to reload its configuration from disk, and also signal
       the managed Postgres service to reload its configuration.

          usage: pg_autoctl reload  [ --pgdata ] [ --json ]

          --pgdata      path to data directory

   Description
       The  pg_autoctl  reload  commands finds the PID of the running service for the given --pgdata, and if the
       process is still running, sends a SIGHUP signal to the process.

   Options
       --pgdata
              Location of the Postgres node being managed locally. Defaults to the environment variable  PGDATA.
              Use  --monitor  to connect to a monitor from anywhere, rather than the monitor URI used by a local
              Postgres node managed with pg_autoctl.

   pg_autoctl status
       pg_autoctl status - Display the current status of the pg_autoctl service

   Synopsis
       This commands outputs the current process status  for  the  pg_autoctl  service  running  for  the  given
       --pgdata location.

          usage: pg_autoctl status  [ --pgdata ] [ --json ]

          --pgdata      path to data directory
          --json        output data in the JSON format

   Options
       --pgdata
              Location  of the Postgres node being managed locally. Defaults to the environment variable PGDATA.
              Use --monitor to connect to a monitor from anywhere, rather than the monitor URI used by  a  local
              Postgres node managed with pg_autoctl.

       --json Output a JSON formated data instead of a table formatted list.

   Example
          $ pg_autoctl status --pgdata node1
          11:26:30 27248 INFO  pg_autoctl is running with pid 26618
          11:26:30 27248 INFO  Postgres is serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 26725

          $ pg_autoctl status --pgdata node1 --json
          11:26:37 27385 INFO  pg_autoctl is running with pid 26618
          11:26:37 27385 INFO  Postgres is serving PGDATA "/Users/dim/dev/MS/pg_auto_failover/tmux/node1" on port 5501 with pid 26725
          {
              "postgres": {
                  "pgdata": "\/Users\/dim\/dev\/MS\/pg_auto_failover\/tmux\/node1",
                  "pg_ctl": "\/Applications\/Postgres.app\/Contents\/Versions\/12\/bin\/pg_ctl",
                  "version": "12.3",
                  "host": "\/tmp",
                  "port": 5501,
                  "proxyport": 0,
                  "pid": 26725,
                  "in_recovery": false,
                  "control": {
                      "version": 0,
                      "catalog_version": 0,
                      "system_identifier": "0"
                  },
                  "postmaster": {
                      "status": "ready"
                  }
              },
              "pg_autoctl": {
                  "pid": 26618,
                  "status": "running",
                  "pgdata": "\/Users\/dim\/dev\/MS\/pg_auto_failover\/tmux\/node1",
                  "version": "1.5.0",
                  "semId": 196609,
                  "services": [
                      {
                          "name": "postgres",
                          "pid": 26625,
                          "status": "running",
                          "version": "1.5.0",
                          "pgautofailover": "1.5.0.1"
                      },
                      {
                          "name": "node-active",
                          "pid": 26626,
                          "status": "running",
                          "version": "1.5.0",
                          "pgautofailover": "1.5.0.1"
                      }
                  ]
              }
          }

CONFIGURING PG_AUTO_FAILOVER

       Several defaults settings of pg_auto_failover can be reviewed and changed depending on the trade-offs you
       want  to  implement in your own production setup. The settings that you can change will have an impact of
       the following operations:

          • Deciding when to promote the secondary

            pg_auto_failover decides to implement a failover to the secondary node  when  it  detects  that  the
            primary  node  is  unhealthy.  Changing  the  following  settings  will  have  an impact on when the
            pg_auto_failover monitor decides to promote the secondary PostgreSQL node:

                pgautofailover.health_check_max_retries
                pgautofailover.health_check_period
                pgautofailover.health_check_retry_delay
                pgautofailover.health_check_timeout
                pgautofailover.node_considered_unhealthy_timeout

          • Time taken to promote the secondary

            At secondary promotion time, pg_auto_failover waits for the following timeout to make sure that  all
            pending writes on the primary server made it to the secondary at shutdown time, thus preventing data
            loss.:

                pgautofailover.primary_demote_timeout

          • Preventing promotion of the secondary

            pg_auto_failover  implements  a  trade-off where data availability trumps service availability. When
            the primary node of a PostgreSQL service is detected unhealthy, the secondary is only promoted if it
            was known to be eligible at the moment when the primary is lost.

            In the case when synchronous replication was in use at the moment when the  primary  node  is  lost,
            then we know we can switch to the secondary safely, and the wal lag is 0 in that case.

            In  the case when the secondary server had been detected unhealthy before, then the pg_auto_failover
            monitor switches it from the state SECONDARY to the state CATCHING-UP  and  promotion  is  prevented
            then.

            The following setting allows to still promote the secondary, allowing for a window of data loss:

                pgautofailover.promote_wal_log_threshold

   pg_auto_failover Monitor
       The  configuration for the behavior of the monitor happens in the PostgreSQL database where the extension
       has been deployed:

          pg_auto_failover=> select name, setting, unit, short_desc from pg_settings where name ~ 'pgautofailover.';
          -[ RECORD 1 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.enable_sync_wal_log_threshold
          setting    | 16777216
          unit       |
          short_desc | Don't enable synchronous replication until secondary xlog is within this many bytes of the primary's
          -[ RECORD 2 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.health_check_max_retries
          setting    | 2
          unit       |
          short_desc | Maximum number of re-tries before marking a node as failed.
          -[ RECORD 3 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.health_check_period
          setting    | 5000
          unit       | ms
          short_desc | Duration between each check (in milliseconds).
          -[ RECORD 4 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.health_check_retry_delay
          setting    | 2000
          unit       | ms
          short_desc | Delay between consecutive retries.
          -[ RECORD 5 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.health_check_timeout
          setting    | 5000
          unit       | ms
          short_desc | Connect timeout (in milliseconds).
          -[ RECORD 6 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.node_considered_unhealthy_timeout
          setting    | 20000
          unit       | ms
          short_desc | Mark node unhealthy if last ping was over this long ago
          -[ RECORD 7 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.primary_demote_timeout
          setting    | 30000
          unit       | ms
          short_desc | Give the primary this long to drain before promoting the secondary
          -[ RECORD 8 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.promote_wal_log_threshold
          setting    | 16777216
          unit       |
          short_desc | Don't promote secondary unless xlog is with this many bytes of the master
          -[ RECORD 9 ]----------------------------------------------------------------------------------------------------
          name       | pgautofailover.startup_grace_period
          setting    | 10000
          unit       | ms
          short_desc | Wait for at least this much time after startup before initiating a failover.

       You can edit the parameters as usual with PostgreSQL, either in the postgresql.conf file or  using  ALTER
       DATABASE pg_auto_failover SET parameter = value; commands, then issuing a reload.

   pg_auto_failover Keeper Service
       For  an  introduction  to  the pg_autoctl commands relevant to the pg_auto_failover Keeper configuration,
       please see pg_autoctl_config.

       An example configuration file looks like the following:

          [pg_autoctl]
          role = keeper
          monitor = postgres://autoctl_node@192.168.1.34:6000/pg_auto_failover
          formation = default
          group = 0
          hostname = node1.db
          nodekind = standalone

          [postgresql]
          pgdata = /data/pgsql/
          pg_ctl = /usr/pgsql-10/bin/pg_ctl
          dbname = postgres
          host = /tmp
          port = 5000

          [replication]
          slot = pgautofailover_standby
          maximum_backup_rate = 100M
          backup_directory = /data/backup/node1.db

          [timeout]
          network_partition_timeout = 20
          postgresql_restart_failure_timeout = 20
          postgresql_restart_failure_max_retries = 3

       To output, edit and check entries of the configuration, the following commands are provided:

          pg_autoctl config check [--pgdata <pgdata>]
          pg_autoctl config get [--pgdata <pgdata>] section.option
          pg_autoctl config set [--pgdata <pgdata>] section.option value

       The [postgresql] section is discovered automatically by the pg_autoctl command and is not intended to  be
       changed manually.

       pg_autoctl.monitor

       PostgreSQL service URL of the pg_auto_failover monitor, as given in the output of the pg_autoctl show uri
       command.

       pg_autoctl.formation

       A  single  pg_auto_failover  monitor  may  handle several postgres formations. The default formation name
       default is usually fine.

       pg_autoctl.group

       This information is retrieved by the pg_auto_failover keeper when registering a node to the monitor,  and
       should not be changed afterwards. Use at your own risk.

       pg_autoctl.hostname

       Node  hostname  used  by  all the other nodes in the cluster to contact this node. In particular, if this
       node is a primary then its standby uses that address to setup streaming replication.

       replication.slot

       Name of the PostgreSQL replication slot used in the streaming replication setup automatically deployed by
       pg_auto_failover. Replication slots can't be renamed in PostgreSQL.

       replication.maximum_backup_rate

       When pg_auto_failover (re-)builds a standby node using the pg_basebackup command, this parameter is given
       to pg_basebackup to throttle the network bandwidth used. Defaults to 100Mbps.

       replication.backup_directory

       When pg_auto_failover (re-)builds a standby node using the pg_basebackup command, this parameter  is  the
       target  directory where to copy the bits from the primary server. When the copy has been successful, then
       the directory is renamed to postgresql.pgdata.

       The default value is computed from ${PGDATA}/../backup/${hostname} and can be set to any  value  of  your
       preference. Remember that the directory renaming is an atomic operation only when both the source and the
       target of the copy are in the same filesystem, at least in Unix systems.

       timeout

       This section allows to setup the behavior of the pg_auto_failover keeper in interesting scenarios.

       timeout.network_partition_timeout

       Timeout  in  seconds  before  we  consider  failure  to  communicate with other nodes indicates a network
       partition. This check is only done on a PRIMARY server, so other nodes mean  both  the  monitor  and  the
       standby.

       When  a  PRIMARY  node  is detected to be on the losing side of a network partition, the pg_auto_failover
       keeper enters the DEMOTE state and stops the PostgreSQL instance in order to protect against split  brain
       situations.

       The default is 20s.

       timeout.postgresql_restart_failure_timeout

       timeout.postgresql_restart_failure_max_retries

       When PostgreSQL is not running, the first thing the pg_auto_failover keeper does is try to restart it. In
       case of a transient failure (e.g. file system is full, or other dynamic OS resource constraint), the best
       course  of  action  is  to  try again for a little while before reaching out to the monitor and ask for a
       failover.

       The pg_auto_failover keeper tries to  restart  PostgreSQL  timeout.postgresql_restart_failure_max_retries
       times  in  a  row (default 3) or up to timeout.postgresql_restart_failure_timeout (defaults 20s) since it
       detected that PostgreSQL is not running, whichever comes first.

OPERATING PG_AUTO_FAILOVER

       This section is not yet complete. Please contact us with any questions.

   Deployment
       pg_auto_failover is a general purpose tool for setting up PostgreSQL replication in  order  to  implement
       High Availability of the PostgreSQL service.

   Provisioning
       It  is  also  possible to register pre-existing PostgreSQL instances with a pg_auto_failover monitor. The
       pg_autoctl create command honors the PGDATA  environment  variable,  and  checks  whether  PostgreSQL  is
       already  running.  If  Postgres  is  detected,  the  new node is registered in SINGLE mode, bypassing the
       monitor's role assignment policy.

   Upgrading pg_auto_failover, from versions 1.4 onward
       When upgrading a pg_auto_failover setup, the procedure is different on the monitor and  on  the  Postgres
       nodes:

          • on  the  monitor,  the  internal pg_auto_failover database schema might have changed and needs to be
            upgraded to its new definition, porting  the  existing  data  over.  The  pg_auto_failover  database
            contains the registration of every node in the system and their current state.
                It  is not possible to trigger a failover during the monitor update.  Postgres operations on the
                Postgres nodes continue normally.

                During the restart of the monitor, the other nodes might have trouble connecting to the monitor.
                The pg_autoctl command is designed  to  retry  connecting  to  the  monitor  and  handle  errors
                gracefully.

          • on  the  Postgres nodes, the pg_autoctl command connects to the monitor every once in a while (every
            second by default), and then calls the node_active protocol,  a  stored  procedure  in  the  monitor
            databases.
                The  pg_autoctl  also  verifies at each connection to the monitor that it's running the expected
                version of the extension. When that's not the case, the "node-active" sub-process quits,  to  be
                restarted with the possibly new version of the pg_autoctl binary found on-disk.

       As a result, here is the standard upgrade plan for pg_auto_failover:

          1. Upgrade the pg_auto_failover package on the all the nodes, monitor included.
                 When using a debian based OS, this looks like the following command when from 1.4 to 1.5:

                    sudo apt-get remove pg-auto-failover-cli-1.4 postgresql-11-auto-failover-1.4
                    sudo apt-get install -q -y pg-auto-failover-cli-1.5 postgresql-11-auto-failover-1.5

          2. Restart the pgautofailover service on the monitor.
                 When using the systemd integration, all we need to do is:

                    sudo systemctl restart pgautofailover

                 Then we may use the following commands to make sure that the service is running as expected:

                    sudo systemctl status pgautofailover
                    sudo journalctl -u pgautofailover

                 At  this  point it is expected that the pg_autoctl logs show that an upgrade has been performed
                 by using the ALTER EXTENSION pgautofailover UPDATE TO ... command. The monitor  is  ready  with
                 the new version of pg_auto_failover.

       When  the  Postgres  nodes  pg_autoctl process connects to the new monitor version, the check for version
       compatibility fails, and the "node-active" sub-process exits. The main pg_autoctl process supervisor then
       restart the "node-active" sub-process from its on-disk binary executable file, which has been upgraded to
       the new version. That's why we first install the new packages for pg_auto_failover  on  every  node,  and
       only then restart the monitor.

       IMPORTANT:
          Before  upgrading  the  monitor,  which is a simple restart of the pg_autoctl process, it is important
          that the OS packages for pgautofailover be updated on all the Postgres nodes.

          When that's not the case, pg_autoctl on the Postgres nodes will still detect a version  mismatch  with
          the  monitor extension, and the "node-active" sub-process will exit. And when restarted automatically,
          the same version of the local pg_autoctl binary executable is  found  on-disk,  leading  to  the  same
          version mismatch with the monitor extension.

          After restarting the "node-active" process 5 times, pg_autoctl quits retrying and stops. This includes
          stopping the Postgres service too, and a service downtime might then occur.

       And  when the upgrade is done we can use pg_autoctl show state on the monitor to see that eveything is as
       expected.

   Upgrading from previous pg_auto_failover versions
       The new upgrade procedure described in the previous section is part  of  pg_auto_failover  since  version
       1.4.  When  upgrading  from a previous version of pg_auto_failover, up to and including version 1.3, then
       all the pg_autoctl processes have to be restarted fully.

       To prevent triggering a  failover  during  the  upgrade,  it's  best  to  put  your  secondary  nodes  in
       maintenance. The procedure then looks like the following:

          1. Enable maintenance on your secondary node(s):

                 pg_autoctl enable maintenance

          2. Upgrade the OS packages for pg_auto_failover on every node, as per previous section.

          3. Restart the monitor to upgrade it to the new pg_auto_failover version:
                 When using the systemd integration, all we need to do is:

                    sudo systemctl restart pgautofailover

                 Then we may use the following commands to make sure that the service is running as expected:

                    sudo systemctl status pgautofailover
                    sudo journalctl -u pgautofailover

                 At  this  point it is expected that the pg_autoctl logs show that an upgrade has been performed
                 by using the ALTER EXTENSION pgautofailover UPDATE TO ... command. The monitor  is  ready  with
                 the new version of pg_auto_failover.

          4. Restart pg_autoctl on all Postgres nodes on the cluster.
                 When using the systemd integration, all we need to do is:

                    sudo systemctl restart pgautofailover

                 As in the previous point in this list, make sure the service is now running as expected.

          5. Disable maintenance on your secondary nodes(s):

                 pg_autoctl disable maintenance

   Extension dependencies when upgrading the monitor
       Since  version 1.4.0 the pgautofailover extension requires the Postgres contrib extension btree_gist. The
       pg_autoctl command arranges for the creation of this dependency, and has been buggy in some releases.

       As a result, you might have trouble upgrade the pg_auto_failover monitor  to  a  recent  version.  It  is
       possible to fix the error by connecting to the monitor Postgres database and running the create extension
       command manually:

          # create extension btree_gist;

   Cluster Management and Operations
       It  is  possible to operate pg_auto_failover formations and groups directly from the monitor. All that is
       needed is an access to the monitor Postgres database as a client, such as psql. It's also possible to add
       those management SQL function calls in your own ops application if you have one.

       For security reasons, the autoctl_node is not allowed to perform maintenance  operations.  This  user  is
       limited  to  what  pg_autoctl  needs.   You  can either create a specific user and authentication rule to
       expose for management, or edit the default HBA rules for the autoctl  user.  In  the  following  examples
       we're directly connecting as the autoctl role.

       The  main  operations  with  pg_auto_failover  are  node maintenance and manual failover, also known as a
       controlled switchover.

   Maintenance of a secondary node
       It is possible to put a secondary node in any group in a MAINTENANCE state, so that the  Postgres  server
       is  not  doing  synchronous  replication  anymore and can be taken down for maintenance purposes, such as
       security kernel upgrades or the like.

       The command line tool pg_autoctl exposes an API to schedule maintenance operations on the  current  node,
       which must be a secondary node at the moment when maintenance is requested.

       Here's an example of using the maintenance commands on a secondary node, including the output. Of course,
       when you try that on your own nodes, dates and PID information might differ:

          $ pg_autoctl enable maintenance
          17:49:19 14377 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          17:49:19 14377 INFO  Following table displays times when notifications are received
              Time |  ID |      Host |   Port |       Current State |      Assigned State
          ---------+-----+-----------+--------+---------------------+--------------------
          17:49:19 |   1 | localhost |   5001 |             primary |        wait_primary
          17:49:19 |   2 | localhost |   5002 |           secondary |    wait_maintenance
          17:49:19 |   2 | localhost |   5002 |    wait_maintenance |    wait_maintenance
          17:49:20 |   1 | localhost |   5001 |        wait_primary |        wait_primary
          17:49:20 |   2 | localhost |   5002 |    wait_maintenance |         maintenance
          17:49:20 |   2 | localhost |   5002 |         maintenance |         maintenance

       The  command  listens  to  the state changes in the current node's formation and group on the monitor and
       displays those changes as it receives them.  The  operation  is  done  when  the  node  has  reached  the
       maintenance state.

       It is now possible to disable maintenance to allow pg_autoctl to manage this standby node again:

          $ pg_autoctl disable maintenance
          17:49:26 14437 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          17:49:26 14437 INFO  Following table displays times when notifications are received
              Time |  ID |      Host |   Port |       Current State |      Assigned State
          ---------+-----+-----------+--------+---------------------+--------------------
          17:49:27 |   2 | localhost |   5002 |         maintenance |          catchingup
          17:49:27 |   2 | localhost |   5002 |          catchingup |          catchingup
          17:49:28 |   2 | localhost |   5002 |          catchingup |           secondary
          17:49:28 |   1 | localhost |   5001 |        wait_primary |             primary
          17:49:28 |   2 | localhost |   5002 |           secondary |           secondary
          17:49:29 |   1 | localhost |   5001 |             primary |             primary

       When  a standby node is in maintenance, the monitor sets the primary node replication to WAIT_PRIMARY: in
       this role, the PostgreSQL streaming replication is now asynchronous and the standby PostgreSQL server may
       be stopped, rebooted, etc.

   Maintenance of a primary node
       A primary node must be available at all times in any formation and group in pg_auto_failover, that is the
       invariant provided by the whole solution. With that in mind, the only way to allow a primary node  to  go
       to a maintenance mode is to first failover and promote the secondary node.

       The  same command pg_autoctl enable maintenance implements that operation when run on a primary node with
       the option --allow-failover. Here is an example of such an operation:

          $ pg_autoctl enable maintenance
          11:53:03 50526 WARN  Enabling maintenance on a primary causes a failover
          11:53:03 50526 FATAL Please use --allow-failover to allow the command proceed

       As we can see the option allow-maintenance is mandatory. In the next example we use it:

          $ pg_autoctl enable maintenance --allow-failover
          13:13:42 1614 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          13:13:42 1614 INFO  Following table displays times when notifications are received
              Time |  ID |      Host |   Port |       Current State |      Assigned State
          ---------+-----+-----------+--------+---------------------+--------------------
          13:13:43 |   2 | localhost |   5002 |             primary | prepare_maintenance
          13:13:43 |   1 | localhost |   5001 |           secondary |   prepare_promotion
          13:13:43 |   1 | localhost |   5001 |   prepare_promotion |   prepare_promotion
          13:13:43 |   2 | localhost |   5002 | prepare_maintenance | prepare_maintenance
          13:13:44 |   1 | localhost |   5001 |   prepare_promotion |    stop_replication
          13:13:45 |   1 | localhost |   5001 |    stop_replication |    stop_replication
          13:13:46 |   1 | localhost |   5001 |    stop_replication |        wait_primary
          13:13:46 |   2 | localhost |   5002 | prepare_maintenance |         maintenance
          13:13:46 |   1 | localhost |   5001 |        wait_primary |        wait_primary
          13:13:47 |   2 | localhost |   5002 |         maintenance |         maintenance

       When the operation is done we can have the old primary re-join the group, this time as a secondary:

          $ pg_autoctl disable maintenance
          13:14:46 1985 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          13:14:46 1985 INFO  Following table displays times when notifications are received
              Time |  ID |      Host |   Port |       Current State |      Assigned State
          ---------+-----+-----------+--------+---------------------+--------------------
          13:14:47 |   2 | localhost |   5002 |         maintenance |          catchingup
          13:14:47 |   2 | localhost |   5002 |          catchingup |          catchingup
          13:14:52 |   2 | localhost |   5002 |          catchingup |           secondary
          13:14:52 |   1 | localhost |   5001 |        wait_primary |             primary
          13:14:52 |   2 | localhost |   5002 |           secondary |           secondary
          13:14:53 |   1 | localhost |   5001 |             primary |             primary

   Triggering a failover
       It is possible to trigger a manual failover, or  a  switchover,  using  the  command  pg_autoctl  perform
       failover. Here's an example of what happens when running the command:

          $ pg_autoctl perform failover
          11:58:00 53224 INFO  Listening monitor notifications about state changes in formation "default" and group 0
          11:58:00 53224 INFO  Following table displays times when notifications are received
              Time |  ID |      Host |   Port |      Current State |     Assigned State
          ---------+-----+-----------+--------+--------------------+-------------------
          11:58:01 |   1 | localhost |   5001 |            primary |           draining
          11:58:01 |   2 | localhost |   5002 |          secondary |  prepare_promotion
          11:58:01 |   1 | localhost |   5001 |           draining |           draining
          11:58:01 |   2 | localhost |   5002 |  prepare_promotion |  prepare_promotion
          11:58:02 |   2 | localhost |   5002 |  prepare_promotion |   stop_replication
          11:58:02 |   1 | localhost |   5001 |           draining |     demote_timeout
          11:58:03 |   1 | localhost |   5001 |     demote_timeout |     demote_timeout
          11:58:04 |   2 | localhost |   5002 |   stop_replication |   stop_replication
          11:58:05 |   2 | localhost |   5002 |   stop_replication |       wait_primary
          11:58:05 |   1 | localhost |   5001 |     demote_timeout |            demoted
          11:58:05 |   2 | localhost |   5002 |       wait_primary |       wait_primary
          11:58:05 |   1 | localhost |   5001 |            demoted |            demoted
          11:58:06 |   1 | localhost |   5001 |            demoted |         catchingup
          11:58:06 |   1 | localhost |   5001 |         catchingup |         catchingup
          11:58:08 |   1 | localhost |   5001 |         catchingup |          secondary
          11:58:08 |   2 | localhost |   5002 |       wait_primary |            primary
          11:58:08 |   1 | localhost |   5001 |          secondary |          secondary
          11:58:08 |   2 | localhost |   5002 |            primary |            primary

       Again,  timings  and  PID  numbers  are  not expected to be the same when you run the command on your own
       setup.

       Also note in the output that the command shows the whole  set  of  transitions  including  when  the  old
       primary is now a secondary node. The database is available for read-write traffic as soon as we reach the
       state wait_primary.

   Implementing a controlled switchover
       It  is  generally useful to distinguish a controlled switchover to a failover. In a controlled switchover
       situation it is possible to organise the sequence of events in  a  way  to  avoid  data  loss  and  lower
       downtime to a minimum.

       In  the  case  of pg_auto_failover, because we use synchronous replication, we don't face data loss risks
       when triggering a manual failover. Moreover, our monitor knows the current primary  health  at  the  time
       when the failover is triggered, and drives the failover accordingly.

       So  to  trigger  a  controlled  switchover with pg_auto_failover you can use the same API as for a manual
       failover:

          $ pg_autoctl perform switchover

       Because the subtelties of orchestrating either a controlled switchover or an unplanned failover  are  all
       handled  by  the  monitor,  rather than the client side command line, at the client level the two command
       pg_autoctl perform failover and pg_autoctl perform switchover are synonyms, or aliases.

   Current state, last events
       The following commands display information from the pg_auto_failover monitor  tables  pgautofailover.node
       and pgautofailover.event:

          $ pg_autoctl show state
          $ pg_autoctl show events

       When  run  on  the  monitor,  the  commands  outputs all the known states and events for the whole set of
       formations handled by the monitor. When run on a PostgreSQL node, the command connects to the monitor and
       outputs the information relevant to the service group of the local node only.

       For interactive debugging it is helpful to run the following command from the  monitor  node  while  e.g.
       initializing a formation from scratch, or performing a manual failover:

          $ watch pg_autoctl show state

   Monitoring pg_auto_failover in Production
       The  monitor  reports every state change decision to a LISTEN/NOTIFY channel named state. PostgreSQL logs
       on the monitor are also stored in a table, pgautofailover.event, and broadcast by NOTIFY in  the  channel
       log.

   Replacing the monitor online
       When  the  monitor  node  is  not available anymore, it is possible to create a new monitor node and then
       switch existing nodes to a new monitor by using the following commands.

          1. Apply the STONITH approach on the old monitor to make sure this node is not going to show up  again
             during the procedure. This step is sometimes refered to as “fencing”.

          2. On  every node, ending with the (current) Postgres primary node for each group, disable the monitor
             while pg_autoctl is still running:

                 $ pg_autoctl disable monitor --force

          3. Create a new monitor node:

                 $ pg_autoctl create monitor ...

          4. On the current primary node first, so that it's registered first and as a primary still,  for  each
             group in your formation(s), enable the monitor online again:

                 $ pg_autoctl enable monitor postgresql://autoctl_node@.../pg_auto_failover

          5. On every other (secondary) node, enable the monitor online again:

                 $ pg_autoctl enable monitor postgresql://autoctl_node@.../pg_auto_failover

       See pg_autoctl_disable_monitor and pg_autoctl_enable_monitor for details about those commands.

       This  operation  relies  on  the  fact  that  a  pg_autoctl  can  be operated without a monitor, and when
       reconnecting to a new monitor, this process reset the parts  of  the  node  state  that  comes  from  the
       monitor, such as the node identifier.

   Trouble-Shooting Guide
       pg_auto_failover  commands  can be run repeatedly. If initialization fails the first time -- for instance
       because a firewall  rule  hasn't  yet  activated  --  it's  possible  to  try  pg_autoctl  create  again.
       pg_auto_failover  will  review  its  previous progress and repeat idempotent operations (create database,
       create extension etc), gracefully handling errors.

FREQUENTLY ASKED QUESTIONS

Those questions have been asked in GitHub issues for the project by several people. If you have more
questions, feel free to open a new issue, and your question and its answer might make it to this FAQ.

I stopped the primary and no failover is happening for 20s to 30s, why?
In order to avoid spurious failovers when the network connectivity is not stable, pg_auto_failover
implements a timeout of 20s before acting on a node that is known unavailable. This needs to be added to
the delay between health checks and the retry policy.

See the configuration part for more information about how to setup the different delays and timeouts that
are involved in the decision making.

See also pg_autoctl_watch to have a dashboard that helps understanding the system and what's going on in
the moment.

The secondary is blocked in the CATCHING_UP state, what should I do?
In the pg_auto_failover design, the following two things are needed for the monitor to be able to
orchestrate nodes integration completely:

1. Health Checks must be successful

The monitor runs periodic health checks with all the nodes registered in the system. Those health
checks are Postgres connections from the monitor to the registered Postgres nodes, and use the
hostname and port as registered.

The pg_autoctl show state commands column Reachable contains "yes" when the monitor could connect
to a specific node, "no" when this connection failed, and "unknown" when no connection has been
attempted yet, since the last startup time of the monitor.

The Reachable column from pg_autoctl show state command output must show a "yes" entry before a new
standby node can be orchestrated up to the "secondary" goal state.

2. pg_autoctl service must be running

The pg_auto_failover monitor works by assigning goal states to individual Postgres nodes. The
monitor will not assign a new goal state until the current one has been reached.

To implement a transition from the current state to the goal state assigned by the monitor, the
pg_autoctl service must be running on every node.

When your new standby node stays in the "catchingup" state for a long time, please check that the node is
reachable from the monitor given its hostname and port known on the monitor, and check that the
pg_autoctl run command is running for this node.

When things are not obvious, the next step is to go read the logs. Both the output of the pg_autoctl
command and the Postgres logs are relevant. See the Should I read the logs? Where are the logs? question
for details.

Should I read the logs? Where are the logs?
Yes. If anything seems strange to you, please do read the logs.

As maintainers of the pg_autoctl tool, we can't foresee everything that may happen to your production
environment. Still, a lot of efforts is spent on having a meaningful output. So when you're in a
situation that's hard to understand, please make sure to read the pg_autoctl logs and the Postgres logs.

When using systemd integration, the pg_autoctl logs are then handled entirely by the journal facility of
systemd. Please then refer to journalctl for viewing the logs.

The Postgres logs are to be found in the $PGDATA/log directory with the default configuration deployed by
pg_autoctl create .... When a custom Postgres setup is used, please refer to your actual setup to find
Postgres logs.

The state of the system is blocked, what should I do?
This question is a general case situation that is similar in nature to the previous situation, reached
when adding a new standby to a group of Postgres nodes. Please check the same two elements: the monitor
health checks are successful, and the pg_autoctl run command is running.

The monitor is a SPOF in pg_auto_failover design, how should we handle that?
When using pg_auto_failover, the monitor is needed to make decisions and orchestrate changes in all the
registered Postgres groups. Decisions are transmitted to the Postgres nodes by the monitor assigning
nodes a goal state which is different from their current state.

Consequences of the monitor being unavailable
Nodes contact the monitor each second and call the node_active stored procedure, which returns a goal
state that is possibly different from the current state.

The monitor only assigns Postgres nodes with a new goal state when a cluster wide operation is needed. In
practice, only the following operations require the monitor to assign a new goal state to a Postgres
node:

• a new node is registered

• a failover needs to happen, either triggered automatically or manually

• a node is being put to maintenance

• a node replication setting is being changed.

When the monitor node is not available, the pg_autoctl processes on the Postgres nodes will fail to
contact the monitor every second, and log about this failure. Adding to that, no orchestration is
possible.

The Postgres streaming replication does not need the monitor to be available in order to deliver its
service guarantees to your application, so your Postgres service is still available when the monitor is
not available.

To repair your installation after having lost a monitor, the following scenarios are to be considered.

The monitor node can be brought up again without data having been lost
This is typically the case in Cloud Native environments such as Kubernetes, where you could have a
service migrated to another pod and re-attached to its disk volume. This scenario is well supported by
pg_auto_failover, and no intervention is needed.

It is also possible to use synchronous archiving with the monitor so that it's possible to recover from
the current archives and continue operating without intervention on the Postgres nodes, except for
updating their monitor URI. This requires an archiving setup that uses synchronous replication so that
any transaction committed on the monitor is known to have been replicated in your WAL archive.

At the moment, you have to take care of that setup yourself. Here's a quick summary of what needs to be
done:

1. Schedule base backups

Use pg_basebackup every once in a while to have a full copy of the monitor Postgres database
available.

2. Archive WAL files in a synchronous fashion

Use pg_receivewal --sync ... as a service to keep a WAL archive in sync with the monitor Postgres
instance at all time.

3. Prepare a recovery tool on top of your archiving strategy

Write a utility that knows how to create a new monitor node from your most recent pg_basebackup
copy and the WAL files copy.

Bonus points if that tool/script is tested at least once a day, so that you avoid surprises on the
unfortunate day that you actually need to use it in production.

A future version of pg_auto_failover will include this facility, but the current versions don't.

The monitor node can only be built from scratch again
If you don't have synchronous archiving for the monitor set-up, then you might not be able to restore a
monitor database with the expected up-to-date node metadata. Specifically we need the nodes state to be
in sync with what each pg_autoctl process has received the last time they could contact the monitor,
before it has been unavailable.

It is possible to register nodes that are currently running to a new monitor without restarting Postgres
on the primary. For that, the procedure mentionned in replacing_monitor_online must be followed, using
the following commands:

$ pg_autoctl disable monitor
$ pg_autoctl enable monitor

AUTHOR

       Microsoft

COPYRIGHT

       Copyright (c) Microsoft Corporation. All rights reserved.

1.6                                               Nov 09, 2021                               PG_AUTO_FAILOVER(1)