Provided by: percona-toolkit_3.2.1-1_all bug

NAME

       pt-diskstats - An interactive I/O monitoring tool for GNU/Linux.

SYNOPSIS

       Usage: pt-diskstats [OPTIONS] [FILES]

       pt-diskstats prints disk I/O statistics for GNU/Linux.  It is somewhat similar to iostat, but it is
       interactive and more detailed.  It can analyze samples gathered from another machine.

RISKS

       Percona Toolkit is mature, proven in the real world, and well tested, but all database tools can pose a
       risk to the system and the database server.  Before using this tool, please:

       •   Read the tool's documentation

       •   Review the tool's known "BUGS"

       •   Test the tool on a non-production server

       •   Backup your production server and verify the backups

DESCRIPTION

       The  pt-diskstats tool is similar to iostat, but has some advantages. It prints read and write statistics
       separately, and has more columns. It is menu-driven and  interactive,  with  several  different  ways  to
       aggregate the data. It integrates well with the pt-stalk tool. It also does the "right thing" by default,
       such  as  hiding disks that are idle.  These properties make it very convenient for quickly drilling down
       into I/O performance and inspecting disk behavior.

       This program works in two modes. The default is to collect samples of /proc/diskstats and print  out  the
       formatted  statistics  at  intervals.  The other mode is to process a file that contains saved samples of
       /proc/diskstats; there is a shell script later in this documentation that shows how  to  collect  such  a
       file.

       In  both  cases,  the  tool is interactively controlled by keystrokes, so you can redisplay and slice the
       data flexibly and easily.  It loops forever, until you exit with the 'q' key.  If you press the '?'  key,
       you will bring up the interactive help menu that shows which keys control the program.

       When  the  program  is  gathering  samples  of  /proc/diskstats  and  refreshing  its  display, it prints
       information about the newest sample each time it refreshes.  When it is operating  on  a  file  of  saved
       samples, it redraws the entire file's contents every time you change an option.

       The  program  doesn't  print information about every block device on the system. It hides devices that it
       has never observed to have any activity.  You can enable and disable this by pressing the 'i' key.

OUTPUT

       In the rest of this documentation,  we  will  try  to  clarify  the  distinction  between  block  devices
       (/dev/sda1,  for  example),  which  the  kernel  presents to the application via a filesystem, versus the
       (usually) physical device underneath the block device, which could be a disk, a RAID controller,  and  so
       on.   We will sometimes refer to logical I/O operations, which occur at the block device, versus physical
       I/Os which are performed on the underlying device.  When we refer to the queue, we are  speaking  of  the
       queue associated with the block device, which holds requests until they're issued to the physical device.

       The  program's output looks like the following sample, which is too wide for this manual page, so we have
       formatted it as several samples with line breaks:

         #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc   rd_rt
         {6} sda     0.9     4.2     0.0     0%    0.0    17.9
         {6} sdb     0.4     4.0     0.0     0%    0.0    26.1
         {6} dm-0    0.0     4.0     0.0     0%    0.0    13.5
         {6} dm-1    0.8     4.0     0.0     0%    0.0    16.0

             ...    wr_s wr_avkb wr_mb_s wr_mrg wr_cnc   wr_rt
             ...    99.7     6.2     0.6    35%    3.7    23.7
             ...    14.5    15.8     0.2    75%    0.5     9.2
             ...     1.0     4.0     0.0     0%    0.0     2.3
             ...   117.7     4.0     0.5     0%    4.1    35.1

             ...              busy in_prg    io_s  qtime stime
             ...                6%      0   100.6   23.3   0.4
             ...                4%      0    14.9    8.6   0.6
             ...                0%      0     1.1    1.5   1.2
             ...                5%      0   118.5   34.5   0.4

       The columns are as follows:

       #ts This column's contents vary depending on the tool's aggregation mode.  In the default mode, when each
           line contains information about a single disk but possibly aggregates  across  several  samples  from
           that  disk,  this  column  shows the number of samples that were included into the line of output, in
           {curly  braces}.   In  the  example  shown,  each  line  of  output  aggregates   {10}   samples   of
           /proc/diskstats.

           In  the "all" group-by mode, this column shows timestamp offsets, relative to the time the tool began
           aggregating or the timestamp of the previous lines printed, depending on the mode.  The output can be
           confusing to explain, but it's rather intuitive when you see  the  lines  appearing  on  your  screen
           periodically.

           Similarly,  in  "sample" group-by mode, the number indicates the total time span that is grouped into
           each sample.

           If you specify "--show-timestamps", this field instead shows the timestamp at which  the  sample  was
           taken;  if  multiple  timestamps  are present in a single line of output, then the first timestamp is
           used.

       device
           The device name.  If there is more than one device, then instead the  number  of  devices  aggregated
           into the line is shown, in {curly braces}.

       rd_s
           The  average  number  of  reads per second.  This is the number of I/O requests that were sent to the
           underlying device.  This usually is a smaller number than the number of logical IO requests  made  by
           applications.  More requests might have been queued to the block device, but some of them usually are
           merged before being sent to the disk.

           This  field  is computed from the contents of /proc/diskstats as follows.  See "KERNEL DOCUMENTATION"
           below for the meaning of the field numbers:

              delta[field1] / delta[time]

       rd_avkb
           The average size of the reads, in kilobytes.  This field is computed as follows:

              2 * delta[field3] / delta[field1]

       rd_mb_s
           The average number of megabytes read per second.  Computed as follows:

              2 * delta[field3] / delta[time]

       rd_mrg
           The percentage of read requests that were merged together in the queue scheduler before being sent to
           the physical device.  The field is computed as follows:

              100 * delta[field2] / (delta[field2] + delta[field1])

       rd_cnc
           The average concurrency of the read operations, as computed by Little's Law.  This is the  end-to-end
           concurrency on the block device, not the underlying disk's concurrency. It includes time spent in the
           queue.  The field is computed as follows:

              delta[field4] / delta[time] / 1000 / devices-in-group

       rd_rt
           The  average  response time of the read operations, in milliseconds.  This is the end-to-end response
           time, including time spent in the queue.  It is the response time that  the  application  making  I/O
           requests  sees,  not  the  response  time  of  the  physical disk underlying the block device.  It is
           computed as follows:

              delta[field4] / (delta[field1] + delta[field2])

       wr_s, wr_avkb, wr_mb_s, wr_mrg, wr_cnc, wr_rt
           These columns show write activity, and they match the corresponding columns for read activity.

       busy
           The fraction of wall-clock time that the device had at least one request in progress;  this  is  what
           iostat  calls  %util, and indeed it is utilization, depending on how you define utilization, but that
           is sometimes ambiguous in common parlance.  It may also be called the residence time; the time during
           which at least one request was resident in the system.  It is computed as follows:

              100 * delta[field10] / (1000 * delta[time])

           This field cannot exceed 100% unless there is a rounding error, but it is a common mistake  to  think
           that  a  device that's busy all the time is saturated.  A device such as a RAID volume should support
           concurrency higher than 1, and solid-state drives can support very high concurrency.  Concurrency can
           grow without bound, and is a more reliable indicator of how loaded the device really is.

       in_prg
           The number of requests that were in progress.  Unlike the read and  write  concurrencies,  which  are
           averages  that  are  generated from reliable numbers, this number is an instantaneous sample, and you
           can see that it might represent a spike of requests, rather than the true long-term average.  If this
           number is large, it essentially means that the device is heavily loaded.  It is computed as follows:

              field9

       ios_s
           The average throughput of the physical device, in I/O operations  per  second  (IOPS).   This  column
           shows the total IOPS the underlying device is handling.  It is the sum of rd_s and wr_s.

       qtime
           The  average  queue  time;  that is, time a request spends in the device scheduler queue before being
           sent to the physical device.  This is an average over reads and writes.

           It is computed in a slightly complex way: the average response time seen by  the  application,  minus
           the average service time (see the description of the next column).  This is derived from the queueing
           theory  formula  for  response  time,  R = W + S: response time = queue time + service time.  This is
           solved for W, of course, to give W = R - S.  The computation follows:

              delta[field11] / (delta[field1, 2, 5, 6] + delta[field9])
                 - delta[field10] / delta[field1, 2, 5, 6]

           See the description for "stime" for more details and cautions.

       stime
           The average service time; that is, the time elapsed while the physical device processes the  request,
           after  the  request  finishes waiting in the queue.  This is an average over reads and writes.  It is
           computed from the queueing theory utilization formula, U  =  SX,  solved  for  S.   This  means  that
           utilization divided by throughput gives service time:

              delta[field10] / (delta[field1, 2, 5, 6])

           Note,  however,  that  there  can be some kernel bugs that cause field 9 in /proc/diskstats to become
           negative, and this can cause field 10 to be wrong, thus  making  the  service  time  computation  not
           wholly trustworthy.

           Note  that  in  the  above  formula  we  use  utilization  very specifically. It is a duration, not a
           percentage.

           You can compare the stime and qtime columns to see whether the response time for reads and writes  is
           spent  in  the queue or on the physical device.  However, you cannot see the difference between reads
           and writes.  Changing the block device scheduler algorithm might improve  queue  time  greatly.   The
           default  algorithm, cfq, is very bad for servers, and should only be used on laptops and workstations
           that perform tasks such as working with spreadsheets and surfing the Internet.

       If you are used to using iostat, you might wonder  where  you  can  find  the  same  information  in  pt-
       diskstats.   Here  are  two  samples  of output from both tools on the same machine at the same time, for
       /dev/sda, wrapped to fit:

               #ts dev rd_s rd_avkb rd_mb_s rd_mrg rd_cnc   rd_rt
          08:50:10 sda  0.0     0.0     0.0     0%    0.0     0.0
          08:50:20 sda  0.4     4.0     0.0     0%    0.0    15.5
          08:50:30 sda  2.1     4.4     0.0     0%    0.0    21.1
          08:50:40 sda  2.4     4.0     0.0     0%    0.0    15.4
          08:50:50 sda  0.1     4.0     0.0     0%    0.0    33.0

                       wr_s wr_avkb wr_mb_s wr_mrg wr_cnc   wr_rt
                        7.7    25.5     0.2    84%    0.0     0.3
                       49.6     6.8     0.3    41%    2.4    28.8
                      210.1     5.6     1.1    28%    7.4    25.2
                      297.1     5.4     1.6    26%   11.4    28.3
                       11.9    11.7     0.1    66%    0.2     4.9

                               busy  in_prg   io_s  qtime   stime
                                 1%       0    7.7    0.1     0.2
                                 6%       0   50.0   28.1     0.7
                                12%       0  212.2   24.8     0.4
                                16%       0  299.5   27.8     0.4
                                 1%       0   12.0    4.7     0.3

                   Dev rrqm/s  wrqm/s   r/s    w/s  rMB/s  wMB/s
          08:50:10 sda   0.00   41.40  0.00   7.70   0.00   0.19
          08:50:20 sda   0.00   34.70  0.40  49.60   0.00   0.33
          08:50:30 sda   0.00   83.30  2.10 210.10   0.01   1.15
          08:50:40 sda   0.00  105.10  2.40 297.90   0.01   1.58
          08:50:50 sda   0.00   22.50  0.10  11.10   0.00   0.13

                          avgrq-sz avgqu-sz  await  svctm  %util
                             51.01     0.02   2.04   1.25   0.96
                             13.55     2.44  48.76   1.16   5.79
                             11.15     7.45  35.10   0.55  11.76
                             10.81    11.40  37.96   0.53  15.97
                             24.07     0.17  15.60   0.87   0.97

       The correspondence between the columns is not one-to-one.  In particular:

       rrqm/s, wrqm/s
           These columns in iostat are replaced by rd_mrg and wr_mrg in pt-diskstats.

       avgrq-sz
           This column is in sectors in iostat, and is a combination of  reads  and  writes.   The  pt-diskstats
           output breaks these out separately and shows them in kB.  You can derive it via a weighted average of
           rd_avkb  and  wr_avkb  in  pt-diskstats,  and  then  multiply by 2 to get sectors (each sector is 512
           bytes).

       avgqu-sz
           This column really represents concurrency at the block device  scheduler.   The  pt-diskstats  output
           shows concurrency for reads and writes separately: rd_cnc and wr_cnc.

       await
           This  column  is  the  average  response time from the beginning to the end of a request to the block
           device, including queue time and service time, and  is  not  shown  in  pt-diskstats.   Instead,  pt-
           diskstats  shows  individual response times at the disk level for reads and writes (rd_rt and wr_rt),
           as well as queue time versus service time for reads and writes in aggregate.

       svctm
           This column is the average service time at the disk, and is shown as stime in pt-diskstats.

       %util
           This column is called busy in pt-diskstats.  Utilization is usually defined as the  portion  of  time
           during  which  there  was  at least one active request, not as a percentage, which is why we chose to
           avoid this confusing term.

COLLECTING DATA

       It is straightforward to gather a sample of data for this tool.  Files should have this  format,  with  a
       timestamp line preceding each sample of statistics:

          TS <timestamp>
          <contents of /proc/diskstats>
          TS <timestamp>
          <contents of /proc/diskstats>
          ... et cetera

       You  can  simply  use  pt-diskstats  with  "--save-samples" to collect this data for you.  If you wish to
       capture samples as part of some other tool, and use pt-diskstats to  analyze  them,  you  can  include  a
       snippet of shell script such as the following:

          INTERVAL=1
          while true; do
             sleep=$(date +%s.%N | awk "{print $INTERVAL - (\$1 % $INTERVAL)}")
             sleep $sleep
             date +"TS %s.%N %F %T" >> diskstats-samples.txt
             cat /proc/diskstats >> diskstats-samples.txt
          done

KERNEL DOCUMENTATION

       This           documentation           supplements           the          official          documentation
       <http://www.kernel.org/doc/Documentation/iostats.txt>  on  the   contents   of   /proc/diskstats.    That
       documentation  can  sometimes be difficult to understand for those who are not familiar with Linux kernel
       internals.  The contents of /proc/diskstats are generated  by  the  "diskstats_show()"  function  in  the
       kernel source file block/genhd.c.

       Here is a sample of /proc/diskstats on a recent kernel.

          8 1 sda1 426 243 3386 2056 3 0 18 87 0 2135 2142

       The  fields in this sample are as follows.  The first three fields are the major and minor device numbers
       (8, 1), and the device name (sda1). They are followed by 11 fields of statistics:

       1.  The number of reads completed.  This is the number of physical reads done by the underlying disk, not
           the number of reads that applications made from the block device.  This means that 426  actual  reads
           have completed successfully to the disk on which /dev/sda1 resides.  Reads are not counted until they
           complete.

       2.  The  number  of  reads merged because they were adjacent.  In the sample, 243 reads were merged. This
           means that /dev/sda1 actually received 869 logical reads, but sent only 426  physical  reads  to  the
           underlying physical device.

       3.  The  number  of  sectors  read  successfully.   The 426 physical reads to the disk read 3386 sectors.
           Sectors are 512 bytes, so a total of about 1.65MB have been read from /dev/sda1.

       4.  The number of milliseconds spent reading.  This counts only reads that have completed, not reads that
           are in progress.  It counts the time spent from when requests are placed  on  the  queue  until  they
           complete,  not  the time that the underlying disk spends servicing the requests. That is, it measures
           the total response time seen by applications, not disk response times.

       5.  Ditto for field 1, but for writes.

       6.  Ditto for field 2, but for writes.

       7.  Ditto for field 3, but for writes.

       8.  Ditto for field 4, but for writes.

       9.  The number of I/Os currently in progress, that is, they've been scheduled by the queue scheduler  and
           issued to the disk (submitted to the underlying disk's queue), but not yet completed.  There are bugs
           in some kernels that cause this number, and thus fields 10 and 11, to be wrong sometimes.

       10. The  total  number of milliseconds spent doing I/Os.  This is not the total response time seen by the
           applications; it is the total amount of time during which at least one I/O was in progress.   If  one
           I/O is issued at time 100, another comes in at 101, and both of them complete at 102, then this field
           increments by 2, not 3.

       11. This  field  counts  the  total response time of all I/Os.  In contrast to field 10, it counts double
           when two I/Os overlap.  In our previous example, this field would increment by 3, not 2.

OPTIONS

       This tool accepts additional command-line arguments.  Refer to the "SYNOPSIS" and usage  information  for
       details.

       --columns-regex
           type: string; default: .

           Print columns that match this Perl regex.

       --config
           type: Array

           Read  this  comma-separated  list of config files; if specified, this must be the first option on the
           command line.

       --devices-regex
           type: string

           Print devices that match this Perl regex.

       --group-by
           type: string; default: all

           Group-by mode: disk, sample, or all.  In disk mode, each line of output shows one disk  device,  with
           the statistics computed since the tool started.  In sample mode, each line of output shows one sample
           of  statistics,  with all disks averaged together.  In all mode, each line of output shows one sample
           and one disk device.

       --headers
           type: Hash; default: group,scroll

           If "group" is present, each sample will be separated by a blank line, unless the sample is  only  one
           line.   If  "scroll"  is  present, the tool will print the headers as often as needed to prevent them
           from scrolling out of view. Note that you can press the space bar,  or  the  enter  key,  to  reprint
           headers at will.

       --help
           Show help and exit.

       --interval
           type: int; default: 1

           When  in  interactive  mode,  wait N seconds before printing to the screen.  Also, how often the tool
           should sample /proc/diskstats.

           The tool attempts to gather statistics exactly on even intervals of clock  time.   That  is,  if  you
           specify a 5-second interval, it will try to capture samples at 12:00:00, 12:00:05, and so on; it will
           not gather at 12:00:01, 12:00:06 and so forth.

           This  can  lead  to  slightly odd delays in some circumstances, because the tool waits one full cycle
           before printing out the first set of lines. (Unlike iostat and vmstat, pt-diskstats  does  not  start
           with  a  line  representing  the averages since the computer was booted.)  Therefore, the rule has an
           exception to avoid very long delays.  Suppose you specify a 10-second interval,  but  you  start  the
           tool  at  12:00:00.01.  The tool might wait until 12:00:20 to print its first lines of output, and in
           the intervening 19.99 seconds, it would appear to do nothing.

           To alleviate this, the tool waits until the next even interval of time to gather,  unless  more  than
           20%  of  that  interval  remains.  This means the tool will never wait more than 120% of the sampling
           interval to produce output, e.g if you start the tool at 12:00:53 with a 10-second sampling interval,
           then the first sample will be only 7 seconds long, not 10 seconds.

       --iterations
           type: int

           When in interactive mode, stop after N samples.  Run forever by default.

       --sample-time
           type: int; default: 1

           In --group-by sample mode, include N seconds of samples per group.

       --save-samples
           type: string

           File to save diskstats samples in; these can be used for later analysis.

       --show-inactive
           Show inactive devices.

       --show-timestamps
           Show a 'HH:MM:SS' timestamp in the "#ts" column.  If multiple  timestamps  are  aggregated  into  one
           line, the first timestamp is shown.

       --version
           Show version and exit.

       --[no]version-check
           default: yes

           Check for the latest version of Percona Toolkit, MySQL, and other programs.

           This  is  a standard "check for updates automatically" feature, with two additional features.  First,
           the tool checks its own version and also the versions of the following  software:  operating  system,
           Percona Monitoring and Management (PMM), MySQL, Perl, MySQL driver for Perl (DBD::mysql), and Percona
           Toolkit.  Second,  it  checks  for  and  warns about versions with known problems. For example, MySQL
           5.5.25 had a critical bug and was re-released as 5.5.25a.

           A secure connection to Percona’s Version Check database server is done to perform these checks.  Each
           request  is  logged  by  the  server, including software version numbers and unique ID of the checked
           system. The ID is generated by the Percona Toolkit installation script  or  when  the  Version  Check
           database call is done for the first time.

           Any  updates  or  known problems are printed to STDOUT before the tool's normal output.  This feature
           should never interfere with the normal operation of the tool.

           For more information, visit <https://www.percona.com/doc/percona-toolkit/LATEST/version-check.html>.

ENVIRONMENT

       The environment variable "PTDEBUG" enables verbose debugging output to STDERR.  To enable  debugging  and
       capture all output to a file, run the tool like:

          PTDEBUG=1 pt-diskstats ... > FILE 2>&1

       Be careful: debugging output is voluminous and can generate several megabytes of output.

SYSTEM REQUIREMENTS

       This tool requires Perl v5.8.0 or newer and the /proc filesystem, unless reading from files.

BUGS

       For a list of known bugs, see <http://www.percona.com/bugs/pt-diskstats>.

       Please  report bugs at <https://jira.percona.com/projects/PT>.  Include the following information in your
       bug report:

       •   Complete command-line used to run the tool

       •   Tool "--version"

       •   MySQL version of all servers involved

       •   Output from the tool including STDERR

       •   Input files (log/dump/config files, etc.)

       If possible, include debugging output by running the tool with "PTDEBUG"; see "ENVIRONMENT".

DOWNLOADING

       Visit <http://www.percona.com/software/percona-toolkit/>  to  download  the  latest  release  of  Percona
       Toolkit.  Or, get the latest release from the command line:

          wget percona.com/get/percona-toolkit.tar.gz

          wget percona.com/get/percona-toolkit.rpm

          wget percona.com/get/percona-toolkit.deb

       You can also get individual tools from the latest release:

          wget percona.com/get/TOOL

       Replace "TOOL" with the name of any tool.

AUTHORS

       Baron Schwartz, Brian Fraser, and Daniel Nichter

ABOUT PERCONA TOOLKIT

       This  tool is part of Percona Toolkit, a collection of advanced command-line tools for MySQL developed by
       Percona.  Percona Toolkit was forked from two  projects  in  June,  2011:  Maatkit  and  Aspersa.   Those
       projects  were  created  by  Baron  Schwartz  and  primarily  developed by him and Daniel Nichter.  Visit
       <http://www.percona.com/software/> to learn about other free, open-source software from Percona.

COPYRIGHT, LICENSE, AND WARRANTY

       This program is copyright 2011-2018 Percona LLC and/or its affiliates, 2010-2011 Baron Schwartz.

       THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT  ANY  EXPRESS  OR  IMPLIED  WARRANTIES,  INCLUDING,  WITHOUT
       LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

       This  program  is  free  software;  you  can  redistribute it and/or modify it under the terms of the GNU
       General Public License as published by the Free Software Foundation, version  2;  OR  the  Perl  Artistic
       License.   On  UNIX  and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these
       licenses.

       You should have received a copy of the GNU General Public License along with this program; if not,  write
       to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.

VERSION

       pt-diskstats 3.2.1

POD ERRORS

       Hey! The above document had some coding errors, which are explained below:

       Around line 5575:
           Non-ASCII character seen before =encoding in 'Percona’s'. Assuming UTF-8

perl v5.30.3                                       2020-08-30                                   PT-DISKSTATS(1p)