Practical performance monitoring tooling on Linux

While browsing my feeds this morning, I came across this article on HowtoForge called How To Extract Values From top And Plot Them.

The first thing that struck me about the article was how they were going about solving their problem:

“Many researchers who are doing performance evaluation and benchmarking need to capture the values of the CPU and the RAM.”

Jesus, if that’s your requirement, you’re using the wrong tool. There are plenty of utilities out there that will do exactly what you’re looking for:

  • iostat iostat (as the name suggests) reports I/O related statistics. It can tell you all about CPU, device, partition, and NFS utilisation.
# report cpu statistics
$ iostat -c 5 5
Linux 2.6.24-12-generic (theodor) 	22/03/08

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         6.73    0.04    2.20    1.32    0.00   89.71

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         1.06    0.00    1.06    0.00    0.00   97.87

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         0.97    0.00    1.07    0.00    0.00   97.95

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         1.07    0.00    1.07    0.00    0.00   97.86

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         0.88    0.00    0.59    1.37    0.00   97.17

# report device utilisation of /dev/sda and all it's partitions.
# display in megabytes per second.
# take 5 second sample infinitely.
$ iostat -p sda -m 5
Linux 2.6.24-12-generic (theodor)       22/03/08

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         6.73    0.04    2.20    1.32    0.00   89.71

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               5.58         0.02         0.04      12383      30201
sda1              7.42         0.01         0.03       4054      18951
sda2              0.24         0.00         0.00        219        439
sda3              4.39         0.01         0.02       8109      10809

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
         2.52    0.00    1.26    0.00    0.00   96.22

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               0.00         0.00         0.00          0          0
sda1              0.00         0.00         0.00          0          0
sda2              0.00         0.00         0.00          0          0
sda3              0.00         0.00         0.00          0          0

^C

  • sar: sar is a brilliant little tool for getting stats on all manner of system activity. Some example usage:
# report CPU utilisation. collect 5 seconds worth of data, 5 times.
$ sar -u 5 5
Linux 2.6.24-12-generic (theodor)       22/03/08
	
12:57:26        CPU     %user     %nice   %system   %iowait    %steal     %idle
12:57:31        all      1.07      0.00      1.07      3.12      0.00     94.73
12:57:36        all      2.42      0.00      1.26      0.00      0.00     96.33
12:57:41        all      1.54      0.00      1.15      0.00      0.00     97.31
12:57:46        all      1.55      0.00      1.36      0.00      0.00     97.09
12:57:51        all      1.84      0.00      1.55      0.00      0.00     96.61
Average:        all      1.68      0.00      1.28      0.62      0.00     96.42
	
# report memory and swap utilisation
$ sar -r 5 5
Linux 2.6.24-12-generic (theodor)       22/03/08
	
12:59:02    kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
12:59:07       947744   1086012     53.40    110560    278340   1131764    334868     22.83    163624
12:59:12       947744   1086012     53.40    110560    278340   1131764    334868     22.83    163656
12:59:17       947744   1086012     53.40    110560    278340   1131764    334868     22.83    163656
12:59:22       947704   1086052     53.40    110560    278372   1131764    334868     22.83    163656
12:59:27       947688   1086068     53.40    110560    278372   1131764    334868     22.83    163656
Average:       947725   1086031     53.40    110560    278353   1131764    334868     22.83    163650
	
# report paging statistics
$ sar -B 5 5
Linux 2.6.24-12-generic (theodor)       22/03/08
	
13:01:50     pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
13:01:55         0.00      0.80    439.24      0.00   1046.81      0.00      0.00      0.00      0.00
13:02:00         0.00      1.59    478.88      0.00   1038.84      0.00      0.00      0.00      0.00
13:02:05         0.00      0.00    439.12      0.00    984.83      0.00      0.00      0.00      0.00
13:02:10         0.00      0.00    477.25      0.00   1011.58      0.00      0.00      0.00      0.00
13:02:15         0.00      0.00    443.06      0.00    990.34      0.00      0.00      0.00      0.00
Average:         0.00      0.48    455.53      0.00   1014.54      0.00      0.00      0.00      0.00

You’ll notice that the iostat and sar outputs are very similar. In fact, they’re almost identical. That’s because they’re from the same project: Sysstat.

Sysstat is a fantastic collection of utilities for doing performance monitoring. Because they all share a common backend, you’re able to do some pretty interesting things with them.

That outputted data we were seeing before? That can be saved and extracted really easily - you can even specify time ranges:

# collect a 10 second sample of network device stats
# log them to the file 'sar-network'
$ sar -n DEV 10 -o sar-network

# ...a whole bunch of output

# retreive network device stats from the sar-network file
$ sar -n DEV -f sar-network

# ...even more output

# extract the same data, but only between 13:24:00 and 13:25:30
$ sar -n DEV -s 13:24:00 -e 13:25:30 -f sar-network

Newer versions of Sysstat include pidstat, which reports detailed per-process stats including I/O, page faults, memory utilisation, context switches, CPU time, and can even report on child processes and threads.

You can find all these tool in Redhat and Debian-based distros in the sysstat package.

If you’re looking for something a bit more user friendly in its output (but not necessarily as detailed), there’s always Dstat. It essentially collects the same information as all the Sysstat tools, but displays it in a much nicer format (with colourisation, and in cleanly labeled columns).

# show cpu, disk, and network stats
$ dstat -cdn
----total-cpu-usage---- -dsk/total- -net/total-
usr sys idl wai hiq siq| read  writ| recv  send
  7   2  90   1   0   0|  18k   44k|   0     0
  2   1  97   0   0   0|   0     0 |   0     0
  2   0  98   0   0   0|   0     0 |   0     0
  0   1  99   0   0   0|   0     0 |   0     0
  1   1  97   0   0   0|   0     0 |   0     0
  0   1  98   0   0   0|   0     0 |   0     0
  0   1  99   0   0   0|   0     0 |   0     0
  1   0  98   0   0   0|   0     0 |   0     0

^C

You can also output the statistics in CSV, which makes it really easy for importing the data into a spreadsheet.

# output file lock, tcp socket, and unix socket statistics to stats.csv
$ dstat --lock --tcp --unix --output stats.csv

Dstat, like Sysstat, can be found in most of the major distros.

Also worth checking out is collectd, the “would you like graphs with that?” network aware stat collection daemon, which collects the same data as Sysstat and Dstat, but outputs it to RRD for quick and easy graphing love. Collectd is my new favourite piece of software, so I’ll probably post a bit more about it.

I’ve barely touched the surface of performance monitoring under Linux, but hopefully this will give you starting point other than “mangle top’s manky output”.