Recent Changes - Search:

Home Page



Other tools


edit SideBar

NmonFAQ

Site.NmonFAQ History

Show minor edits - Show changes to markup

January 03, 2017, at 04:11 PM by 127.0.0.1 -
Changed line 6 from:

This website covers nmon for Linux but many users also use nmon for AIX so both are covered in this FAQ.

to:

This website is about nmon for Linux but many users also use nmon for AIX so both are covered in this FAQ.

January 03, 2017, at 04:03 PM by 127.0.0.1 -
Changed line 1 from:

nmon for Linux and AIX Frequently Asked Questions (FAQ)

to:

nmon for Linux and AIX Frequently Asked Questions (FAQ)

January 03, 2017, at 03:12 PM by 127.0.0.1 -
Changed lines 125-127 from:
  1. Do not use kill -9 on nmon as kill -USR2 will end in cleanly!
to:
  1. Do not use kill -9 on nmon as kill -USR2 will end it cleanly!
Changed line 1510 from:

Question 73: Do not use kill -9 on nmonas kill -USR2 will end in cleanly!

to:

Question 74: Do not use kill -9 on nmon as kill -USR2 will end it cleanly!

January 03, 2017, at 03:05 PM by 127.0.0.1 -
Changed lines 125-127 from:
  1. Do not use kill -9 as kill -USR2 will end in cleanly!
to:
  1. Do not use kill -9 on nmon as kill -USR2 will end in cleanly!
Changed line 1510 from:

Question 73: Do not use kill -9 as kill -USR2 will end in cleanly!

to:

Question 73: Do not use kill -9 on nmonas kill -USR2 will end in cleanly!

January 03, 2017, at 02:54 PM by 127.0.0.1 -
Changed lines 125-126 from:
to:
  1. Do not use kill -9 as kill -USR2 will end in cleanly!
Added lines 1508-1547:

Question 73: Do not use kill -9 as kill -USR2 will end in cleanly!

  • Using the kill -9 PID command on nmon to instantly stop it is a thoroughly unpleasant thing to do because if the last line being output is not complete then you have just broken the file format and later this file might fail in a graphing tool in an ugly way.
  • One case where you do want to stop nmon quickly (before its natural end) is in benchmarks, where once the benchmark run is finished you want to stop nmon as any further details are not required.
  • nmon in file capture mode detaches itself from the shell session so that it will continue to run even if you log out or switch off your terminal or X Windows session.
  • This can make it hard to kill as you have to search from the "ps -ef | grep nmon | grep -v ps" command output to find the nmon and if there is more than one you have to guess.
  • If you add the -p option to the nmon start command, it will return the process id of the nmon process before going in to the background. For example:
    $ nmon -f -s60 -c 60 -p
    428963
    
  • The 428963 is the PID.
  • To cleanly, shutdown nmon use the pid with a polite kill signal USR2 (instead of -9).
  • This request nmon to stop after the next collection and thus avoids the last line of output being incomplete.
  • So in this example use:
    $ kill -USR2 428963
    
  • You can save the Process Id (pid) easily in a script, for example:
    pid=$(nmon -s 60 -c 60 -p)
    echo $pid
    . . .
    . . .
    . . .
    # Later in the script you can just use $pid
    kill -USR2 $pid
    . . .
    . . .
    . . .
    # Alternatively, you could save the pid to a file to pick it up later so your script can finish.
    echo $pid >nmon_pid
    . . . 
    . . .
    . . .
    # Then much later read the pid back in from the file and stop nmon
    killpid=$(cat nmon_pid)
    kill -USR2 $killpid
    
  • Either way you get a nice clean stop of nmon
January 03, 2017, at 02:28 PM by 127.0.0.1 -
Changed lines 1470-1471 from:

|disk 0 to 63 @o_8O0__XXXXXXXX

to:

|disk 0 to 63 @o_8O0__XXXXXXXX | +---------------------------------------------------------------------------------------------------------------------------+

Changed lines 1473-1474 from:
  • Note to self: We could use colour on nmon for Linux to highlight the the hotter disks.
to:
  • Note to self: We could use colour on nmon for Linux to highlight the the hotter disks.
January 03, 2017, at 02:27 PM by 127.0.0.1 -
Changed lines 1479-1480 from:
  • Example thei Linux machine has 5 disks and 300+ processes but below thay are reduced to just the busy ones: [@
to:
  • Example here Linux server has 5 disks and 300+ processes but below they are reduced to just the busy ones: [@
Changed lines 1482-1502 from:

│DiskName Busy Read WriteMB|0 |25 |50 |75 100| │ │sda 65% 0.0 81.8|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW> | │ │sda1 66% 0.0 81.8|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW> | │ │Totals Read-MB/s=0.0 Writes-MB/s=163.6 Transfers/sec=659.2 │ │ Top Processes Procs=301-mode=3-1=Base 3=Perf 4=Size 5=I/O[RootOnly] u=Args------------------------------------------------+ │ PID %CPU Size Res Res Res Res Shared Faults Faults Command │ │ Used KB Set Text Data Lib KB Min Maj │ │ 27906 146.5 2436432 931052 12 1894772 0 47408 0 0 compiz │ │ 8863 100.0 121848 17172 32 7484 0 9764 0 0 appstreamcli │ │ 20880 100.0 7292 704 28 324 0 632 0 0 yes │ │ 2027 66.8 1145756 857556 2288 841072 0 20884 0 0 Xorg │ │ 13590 20.3 709840 132616 3728 345632 0 81244 0 0 update-manager │ │ 16905 14.4 13616476 6746892 6396 13309284 0 24864 45 0 qemu-system-x86 │ │ 20822 8.4 0 0 0 0 0 0 0 0 kworker/u16:0 │ │ 396 2.5 747412 42428 27716 688312 0 24172 0 0 docker │ │ 51 1.0 0 0 0 0 0 0 0 0 ksmd │ │ 2210 1.0 0 0 0 0 0 0 0 0 kworker/4:1H │ │ 20845 1.0 20168 6308 152 6420 0 2152 0 0 nmon_x86_ubuntu │ │ 7 0.5 0 0 0 0 0 0 0 0 rcu_sched │ │ 20811 0.5 0 0 0 0 0 0 0 0 kworker/u16:3 │ │ 27851 0.5 528156 30556 288 301184 0 23616 0 0 bamfdaemon │

to:

|DiskName Busy Read WriteMB|0 |25 |50 |75 100| | |sda 65% 0.0 81.8|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW> | | |sda1 66% 0.0 81.8|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW> | | |Totals Read-MB/s=0.0 Writes-MB/s=163.6 Transfers/sec=659.2 | | Top Processes Procs=301-mode=3-1=Base 3=Perf 4=Size 5=I/O[RootOnly] u=Args------------------------------------------------+ | PID %CPU Size Res Res Res Res Shared Faults Faults Command | | Used KB Set Text Data Lib KB Min Maj | | 27906 146.5 2436432 931052 12 1894772 0 47408 0 0 compiz | | 8863 100.0 121848 17172 32 7484 0 9764 0 0 appstreamcli | | 20880 100.0 7292 704 28 324 0 632 0 0 yes | | 2027 66.8 1145756 857556 2288 841072 0 20884 0 0 Xorg | | 13590 20.3 709840 132616 3728 345632 0 81244 0 0 update-manager | | 16905 14.4 13616476 6746892 6396 13309284 0 24864 45 0 qemu-system-x86 | | 20822 8.4 0 0 0 0 0 0 0 0 kworker/u16:0 | | 396 2.5 747412 42428 27716 688312 0 24172 0 0 docker | | 51 1.0 0 0 0 0 0 0 0 0 ksmd | | 2210 1.0 0 0 0 0 0 0 0 0 kworker/4:1H | | 20845 1.0 20168 6308 152 6420 0 2152 0 0 nmon_x86_ubuntu | | 7 0.5 0 0 0 0 0 0 0 0 rcu_sched | | 20811 0.5 0 0 0 0 0 0 0 0 kworker/u16:3 | | 27851 0.5 528156 30556 288 301184 0 23616 0 0 bamfdaemon |

January 03, 2017, at 02:24 PM by 127.0.0.1 -
Changed lines 124-125 from:
to:
  1. On-screen displaying only busy Top Processes and Hot disks?
Changed lines 1475-1503 from:
to:

Question 73: On-screen displaying only busy Top Processes and Hot disks?

  • If you want to NOT see on-screen disks that are zero busy - OR - Top Processes using zero CPU time then you need to use the dot command i.e. "."
  • If currently displayed online then this dot toggles both Top Processes and Disk Graph stats at that same time.
  • Example thei Linux machine has 5 disks and 300+ processes but below thay are reduced to just the busy ones:
    | Disk I/O --/proc/diskstats-----mostly in KB/s------Warning:contains duplicates--------------------------------------------+
    │DiskName Busy  Read WriteMB|0          |25         |50          |75       100|                                             │
    │sda       65%    0.0   81.8|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>                |                                             │
    │sda1      66%    0.0   81.8|WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>                |                                             │
    │Totals Read-MB/s=0.0      Writes-MB/s=163.6    Transfers/sec=659.2                                                         │
    │ Top Processes Procs=301-mode=3-1=Base 3=Perf 4=Size 5=I/O[RootOnly] u=Args------------------------------------------------+
    │  PID        %CPU      Size       Res      Res       Res       Res      Shared   Faults   Faults Command                   │
    │              Used        KB       Set      Text      Data       Lib        KB      Min      Maj                           │
    │    27906    146.5   2436432    931052        12   1894772         0     47408        0        0 compiz                    │
    │     8863    100.0    121848     17172        32      7484         0      9764        0        0 appstreamcli              │
    │    20880    100.0      7292       704        28       324         0       632        0        0 yes                       │
    │     2027     66.8   1145756    857556      2288    841072         0     20884        0        0 Xorg                      │
    │    13590     20.3    709840    132616      3728    345632         0     81244        0        0 update-manager            │
    │    16905     14.4  13616476   6746892      6396  13309284         0     24864       45        0 qemu-system-x86           │
    │    20822      8.4         0         0         0         0         0         0        0        0 kworker/u16:0             │
    │      396      2.5    747412     42428     27716    688312         0     24172        0        0 docker                    │
    │       51      1.0         0         0         0         0         0         0        0        0 ksmd                      │
    │     2210      1.0         0         0         0         0         0         0        0        0 kworker/4:1H              │
    │    20845      1.0     20168      6308       152      6420         0      2152        0        0 nmon_x86_ubuntu           │
    │        7      0.5         0         0         0         0         0         0        0        0 rcu_sched                 │
    │    20811      0.5         0         0         0         0         0         0        0        0 kworker/u16:3             │
    │    27851      0.5    528156     30556       288    301184         0     23616        0        0 bamfdaemon                │
    +---------------------------------------------------------------------------------------------------------------------------+
    
January 03, 2017, at 02:07 PM by 127.0.0.1 -
Changed line 67 from:
  1. on AIX the adapter busy goes over 100%. That is impossible surely?
to:
  1. On AIX the adapter busy goes over 100%. That is impossible surely?
Changed lines 90-91 from:
  1. The Disk Busy stats are missing on AIX
  2. Sort order problems with massive nmon output files.
to:
  1. Automatic starting with certain statistics for nmon online mode?
  2. Sort order problems with massive nmon output files?
Changed lines 652-658 from:

Question 42: The Disk Busy stats are missing on AIX, what do I do?

  • If you are watching this online it will be flashing
    • --> To enable disk stats as root: chdev -l sys0 -a iostat=true
  • at you - this is a big hint on how to switch them on !!!
to:

Question 42: Automatic starting with certain statistics for nmon online mode?

  • Use the NMON shell variable to determine which statistics are shown automatically at start up time.
  • If you find you always want CPU, kernel, Memory and Disks i.e. you type: ckmd then set the shell variable as below:
    export NMON=ckmd
    
  • Next time you start nmon these will be shown automatically.
Changed lines 663-664 from:
  • So you collected more than 9999 snapshots in a single nmon capture. Ignoring the fact that the Excel Analyser can't cope with all this data and it makes the data unmanageable.
  • I suggest a good aim is between 400 and 700 snapshots per file for good graphs and manageable file sizes.
to:
  • So you collected more than 9999 snapshots in a single nmon capture.
  • Ignoring the fact that the Excel Analyser can't cope with all this data and it makes the data unmanageable.
    • I suggest a good aim is between 400 and 700 snapshots per file for good graphs and manageable file sizes.
Changed lines 1466-1469 from:

│ Disk %Busy Map ──Key: @=90 #=80 X=70 8=60 O=50 0=40 o=30 +=20 -=10 .=5 _=0%───────────────────────────────────────────────│ │ Disk No. 1 2 3 4 5 6 │ │Disks=4 0123456789012345678901234567890123456789012345678901234567890123 │ │disk 0 to 63 @o_8O0__XXXXXXXX

to:

+ Disk %Busy Map --Key: @=90 #=80 X=70 8=60 O=50 0=40 o=30 +=20 -=10 .=5 _=0% ----------------------------------------------+ | Disk No. 1 2 3 4 5 6 | |Disks=4 0123456789012345678901234567890123456789012345678901234567890123 | |disk 0 to 63 @o_8O0__XXXXXXXX

January 03, 2017, at 01:46 PM by 127.0.0.1 -
Changed lines 344-347 from:
  • This is working as normal. To get the AIX aioserver stats the details of all processes has to be collected, sorted and searched.

Having paid the CPU cycles for the TOP process stats you may as well see them on the screen or in the output file, so nmon automatically switches them on for you at no addition charge.

to:
  • These are often just called AIO on AIX.
  • This is working as normal.
  • To get the AIX aioserver process stats the details of all processes has to be collected, sorted and searched.
  • Having paid the CPU cycles for the TOP process stats you may as well see them on the screen or in the output file, so nmon automatically switches them on for you at no addition charge!
Changed line 1098 from:
  • Network, Disks stats (not graphs) hit D (upper case d),AIO statistics track the peak values and display them. Also the CPU graphs provide peak indicator.
to:
  • Network, Disks stats (not graphs) hit D (upper case d), AIO statistics track the peak values and display them. Also the CPU graphs provide peak indicator.
January 03, 2017, at 01:42 PM by 127.0.0.1 -
Changed lines 123-124 from:
to:
  1. How can I see 100's of disks on-screen?
Added lines 1439-1468:

Question 72: How can I see 100's of disks on-screen?

  • This is a common problem and nmon for Linux and AIX has a solution for this with the Disk Busy Map
  • This only works online on a screen.
  • Hit: o (lowercase oh!) for the Map
  • Here each disk gets one single character on the screen.
  • The busier the disk the more pixels are shown by using different characters - this is surprisingly effective.
  • nmon for AIX does 50 disks per line
  • nmon for Linux does 64 disks per line
  • Example output from nmon for AIX:
    Disk-Busy-Map-Key(%): @=90 #=80 X=70 8=60 O=50 0=40 o=30 +=20 -=10 .=5 _=0%
     hdisks numbers->           1         2         3         4
                      01234567890123456789012345678901234567890123456789
     hdisk0 to 49     __X_X_.__X__Oooo+____#_+___---___X_____@___.______
                      _#___@____X_O___.__O__X____--____.__--__@@_.__@___
                      ___++_O__+__O_O_.__._@@@#___#__oOOo____@__________
                      _--_X@@OO_oo+__#.___X_.__O_+_______@ @XoOOO##@0O_-
    
  • Example output from nmon for Linux:
    │ Disk %Busy Map ──Key: @=90 #=80 X=70 8=60 O=50 0=40 o=30 +=20 -=10 .=5 _=0%───────────────────────────────────────────────│
    │             Disk No.  1         2         3         4         5         6                                                 │
    │Disks=4      0123456789012345678901234567890123456789012345678901234567890123                                              │
    │disk 0 to 63 @o_8O0__XXXXXXXX   
    
    • Note to self: We could use colour on nmon for Linux to highlight the the hotter disks.
January 03, 2017, at 01:04 PM by 127.0.0.1 -
Added line 127:
  1. Got a suggestion ?
January 03, 2017, at 01:03 PM by 127.0.0.1 -
Deleted lines 41-43:
  • Answers in RED are for very old versions of nmon for AIX - At the bottom.
Changed lines 128-129 from:
to:
Deleted lines 1441-1444:

Take this link for historically old stuff Very Old nmon and Very Old AIX Questions


January 03, 2017, at 12:58 PM by 127.0.0.1 -
Deleted lines 129-138:
  1. How to report a problem?
    • OS version,
    • nmon version,
    • the actual command ran,
    • have you read nmon -h output,
    • the problem,
    • have you tried something simpler,
    • send me a sample file,
    • of screen capture.
Changed lines 254-255 from:

Question 11: What if I want support?

to:

Question 11: What if I want support for nmon?

Changed lines 257-263 from:
  • Give me money (and I have no problem with this) or
  • Pay for and use IBM Tivoli Performance Monitoring product with support
  • Pay for and use PM for AIX a remote service where you servers performance data is sent and it generates all the graphs that you can view online.
  • AIX
    • nmon for AIX is a fully supported AIX command so you can raise a IBM Problem report (PMR). However, you can't really ask for help with post-processing graphing tools that are not part of AIX.
to:
  • Give me (Nigel Griffiths) loads of money (and I have no problem with this) or
  • AIX: Pay for and use IBM Tivoli Performance Monitoring product with Support
  • AIX: Pay for and use PM for AIX (Also called Performance Manager for AIX) which is a remote service. Your server's performance data is sent to IBM over a secure link and it generates all the performance graphs, reports and suggestions that you can view via a Web Browser.
  • AIX - assuming you have paid IBM for Support
    • nmon for AIX is a fully supported AIX command so you can raise a IBM Problem report (PMR).
    • However, you can't really ask for help with post-processing graphing tools that are not part of AIX. So, please, no nmon Analyser or nmonchart questions.
    • AIX Support is not really there to read the nmon for AIX manual pages for you!
    • Also please read very carefully the nmon help information: nmon -h
    • If you raise a PMR then AIX Support will immediate want a snap report and a PerfPMR data collection during the problem - so have these ready.
  • Next note that nmon is not really a problem determination tool (it is a performance monitoring tool) so AIX support will only take a quick look at the nmon data and move on to other problem determinations tools like the excellent AIX trace, svmon tools and other advanced tools.
Changed lines 270-273 from:
  • nmon for Linux is becoming part of the popular distribution - if you have paid for support you could request help
  • You can raise bugs on the sourceforge.net website for the nmon project: https://sourceforge.net/projects/nmon/

If it is something fairly simple you could ask a question on the IBM Performance Tools Forum: IBM Performance Tools Forum

to:
  • nmon for Linux is becoming part of the popular distribution - if you have paid for Linux support you could request help.
  • You can raise bugs on the sourceforge.net website (if you have or get a Sourceforge user account = not hard) for the nmon project: https://sourceforge.net/projects/nmon/

If it is something fairly simple you could ask a question on the IBM Performance Tools Forum (if you have or get a IBM DeveloperWorks user account = not hard): IBM Performance Tools Forum -

  • It is rather annoying when I get asked questions like: "The nmon feature #### is broken, please fix it immediately?"
    • Which OS and its version?
    • Which nmon version?
    • What are the symptoms?
  1. How to report an nmon problem well?
  • Please include the following to get a quicker response and save wasting time asking all these questions before getting started:
    1. OS version
      • AIX: oslevel -s
      • Linux Distro: cat /etc/*ease
    2. nmon version:
      • See Question 1.
    3. Briefly, the type and size of hardware
      • Processor type: POWER, Z, AMD, ARM, Intel other
      • No of CPUs and size of RAM
      • Equipment type: applicance, PC, Laptop, small server, large server - virtual machine or whole server
    4. The actual nmon command ran
      • with all the options
      • Or send me the script used to start because I like a laugh! 9 time out of 10 the script is pointless.
    5. Have you read carefully the nmon -h output?
      • This answer 33% of questions - partilucarly the line: "the -f or -F MUST be the first option on the line"
    6. Describe the symptoms of the perceived problem
      • What were you expecting?
      • What did you get?
    7. Have you tried something simpler?
      • Instead of 25 options have you tried the problem one by itself or using nmon without that option?
    8. Send details
      • Send me a sample file - hopefully not to large
      • or screen capture/scrap showing the problem.

Then you get your question answered sooner.

January 03, 2017, at 12:19 PM by 127.0.0.1 -
Changed lines 822-824 from:
  1. nmon for LInux reads the test from the following files. Hopefuly the files names explain the data in each. If not go have a look.
  2. There is some information available: from the Linux manual: man 5 proc
    • but it is also vague and does not explain units or why the data is sometimes missing.
to:
  • It is a popular mistake to think that nmon for Linux uses Linux commands to get its data. That would be expensive in CPU cycles especially if you request the data every second.
  • For efficiency nmon for Linux gets the data from system calls (where possible) and from the /proc file system mostly.
    • Even though /proc looks like a bunch of files that are in fact more like device drivers where a file read results in a system call to get the data and not disk I/O.
  • nmon for Linux reads the test from the following files. Hopefully the files names explain the data in each file. If not then you can take a look at the file.
  • There is some information available: from the Linux manual: man 5 proc
    • but the Manual is often vague and does not explain units or why the data is sometimes missing.
January 03, 2017, at 12:03 PM by 127.0.0.1 -
Added lines 1371-1372:
  • These comments are based on the original source code handed over to the AIX Performance Tools developer team - the code could have changed since.
January 03, 2017, at 11:48 AM by 127.0.0.1 -
Added line 102:
Deleted line 103:
Added line 124:
Deleted lines 125-127:

- To do:

Added lines 128-131:

- To do:

  1. How does nmon for AIX extract its data?
Changed lines 1372-1373 from:
to:

Question 71: How does nmon for AIX extract its data?

  • It is a common mistake to assume nmon simply uses AIX commands to extract the data it needs. This is not true.
    • That would require nmon to fork and exec dozens of commands every second (the fastest nmon will run) and that could easily take a whole CPU in computer time.
  • If saving to a nmon file then nmon does use AIX commands (just once) to collect the machine configuration = the BBBx lines at the start of the nmon file.
  • The bulk of the stats comes from a special C language library that comes with AIX call libperfstat.
    • Making a library call to extract performance stats is something like 1000 times faster = 1/1000th of the CPU cycles.
    • This covers all the basics like CPU, memory, Disk I/O and networks in details plus lots of POWER specific stats for example, LPAR config & stats, WPAR and virtual CPUs etc.
    • For more information:
      • Read on AIX the C header file for the data structions and function call interfaces at /usr/include/libperfstat.h
      • Read the manual in KnowledgeCenter: perftools libperfstat
      • Note: this often gets updated with each AIX release to add new features.
      • This can make compiling a binary to run on many AIX release impossible.
      • Fortunately, you get a new updated nmon with every AIX release upgrade.
  • There are some other places it gets specific information:
    • Top Process stats are extracted with a getprocs64() system call - see KnowledgeCenter getprocs manual page
    • File system use is extracted with the old classic UNIX system calls setfsent()< getfs() and endfsent()
  • The three exceptions to this are due to there not being a library function to get the data.
    • The three commands are:
      1. fcstat for Fibre Channel adapter stats - Command line option -~
      2. entstat for Ethernet stats used for VIOS SEA stats - command line option -O (usercase 0 for Ocean = SEA !!!)
      3. Top Process user arguments i.e. expanded command like settings - command line option -T (-t only saves the command name). If using nmon online to a screen these are requested using u or U.
    • The first two commands are not regular UNIX ones but highly device driver dependant and the only place to get the adapter level stats on things like packets per adapter send and received.
    • On very busy production machines it is recommended to
      1. Not use these nmon command line options to switch on the collection of these stats
      2. Or not collect the stats too quickly - if you collect them, sa, one a minute or longer they will not add significant CPU cycles.
      3. Particular Warning: if you have thousands of processes (I have seen servers with 40,000 processes). In this case, nmon can struggle to collect process data at all.
    • The Top Process command line uses the regular user ps -Aeo pid,args command to gather the process command lines used to start the processes
      • This is because they can be long i.e. multiple KB's in size (especially the insane Java commands) and so they are not held in the UNIX Kernel data structures.
    • So System call do not return the full command line.
      • The fule command line is held within the user process virtual memory and can cause paging of process memory to extract them to return to nmon.
      • Note nmon caches these command lines to reduce CPU overheads.
      • If you use nmon online with AIX and further processes are started hit u or U twice to refresh the user command line cache.
      • If collecting to a nmon file the ps -p PID -o "c," -o thcount -o",G,"s (PID is the Process ID) command is only called once for every new process found that needs to go in the output file. This also collects the parent PID, number of threads and the user name plus user group.

Changed line 1413 from:
to:

January 03, 2017, at 10:50 AM by 127.0.0.1 -
January 03, 2017, at 10:50 AM by 127.0.0.1 -
January 03, 2017, at 10:27 AM by 127.0.0.1 -
Changed line 31 from:
  • If you don't have access to a machine to run nmon these might help
to:
  • If you don't have access to a machine to run the nmon command then you can't read the help output - so these links will help you:
January 03, 2017, at 10:24 AM by 127.0.0.1 -
Changed lines 4-6 from:

This is a work in progress and may never be finished - Last update Dec 2016.

to:

This is a work in progress and may never be finished - Last update Jan 2017.

This website covers nmon for Linux but many users also use nmon for AIX so both are covered in this FAQ.

Changed lines 28-30 from:
  • If you don't mind using mind using Excel also take a look and the nmon Analyser above
  • Ignore the videos which are more than 2 years old as they are based on older versions although they are still mostly true.
to:
  • If you don't mind using the Microsoft Excel spreadsheet, also take a look and the nmon Analyser video above.
  • Ignore the YouTube videos which are more than 2 years old as they are based on older versions although they are still mostly true.
Changed line 32 from:
  • The Flash screen that you see when nmon starts up: nmon Flash welcome
to:
  • The Flash screen that you see when nmon starts up: nmon Flash welcome with basic help information
January 03, 2017, at 10:19 AM by 127.0.0.1 -
Changed line 12 from:

This is Nigel Griffiths' YouTube - for lots of POWER Chips, Power Systems machines, AIX, PowerVM, PowerVC, PowerSC, Linux on Power videos

to:

This is Nigel Griffiths' YouTube Channel - for lots of videos on nmon, POWER Chips, Power Systems servers, Performance, AIX, PowerVM, PowerVC, PowerSC, Linux on Power.

Changed line 15 from:

nmon on AIX on Power

to:

nmon on AIX on Power

Changed line 21 from:

nmon for Linux on Power, x86/AMD64, mainframe, ARM.

to:

nmon for Linux on Power, x86/AMD64, mainframe, ARM

December 29, 2016, at 11:02 AM by 127.0.0.1 -
Added lines 936-960:

Any other nmon user wants to be able to track the username of processes that are using a lot of CPU time. This is the approach recommended

  • Briefly in pseodo code and commands:
    NMON_START would create the empty file.
        rm -f /tmp/nmon_proc_user; 
        touch /tmp/nmon_proc_user
    
    NMON_SNAP would append ps output to a log file: 
        # This ps command outputs lines like
        #  PID USER
        # 2122 root
        # 2143 root
        # 2175 root
        # 2224 nag
        # 2226 nag
        ps -Ao pid,user >>/tmp/nmon_proc_user
    
    NMON_END would sort the file and remove duplicates then you have a map of PID to Username
        sort -n /tmp/nmon_proc_user | uniq | awk '{ print "BBBU," $1 "," $2 }'
    
  • You could also look at man ps and select any further columns you fancy.
  • It is assumed we are letting the nmon capture run to completion and do post processing.
  • The data could be appended at the end of the nmon file - perhaps making lines start with (say) BBBU so its treated as configuration data for a look up feature.
  • Note of warning running ps takes CPU time but its better than say opening all the /proc/PID/status lines at grep-ing out the Pid and Uid lines and then converting the User ID to a User name. But don't go doing this ps command every second on a machine with 1000's of processes or extremely low memory as it could take a whole CPU out and fail to complete in under a second. If you are capturing say once a minute or a slower rate the ps command should not danage performance.
December 29, 2016, at 10:39 AM by 127.0.0.1 -
Added line 214:
  • See below for External Data Providers (question 57) and User Defined Disk Groups (questions 66 to 68)
Added lines 225-226:
  1. What is causing AIX to run at 99% memory used?
    • This is perfectly normal and show AIX is making use of memory to optimise performance. This is normal and it is a "good thing".
December 28, 2016, at 05:52 PM by 127.0.0.1 -
Changed line 924 from:

http:/docs/process_count_graph.jpg

to:

http:/docs/process_count_graph.gif

December 28, 2016, at 05:26 PM by 127.0.0.1 -
Changed lines 196-197 from:
  • First check it is executable (this gets switched off by FTP).
  • Second, if you are the root user, you have to name the executable directly with the full path name or (if in the current working directory) ./nmon or put it into a directory in your $PATH.
to:
  • First check the nmon file is executable (this gets switched off by FTP) with: ls -l `which nmon`
  • In case you are new to Linux or UNIX set the executable flag with: chmod ugo+c `which nmon`
  • Second, if you are the root user, you have to name the executable directly with the full path name or (if in the current working directory) ./nmon or put it into a directory in your $PATH, for example: /usr/local/bin
December 28, 2016, at 05:20 PM by 127.0.0.1 -
Changed lines 30-31 from:
  • The Flash screen that you see when nmon starts up

XXX

to:
  • The Flash screen that you see when nmon starts up: nmon Flash welcome
Changed line 923 from:

http://www.ibm.com/developerworks/community/wikis/form/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/994db5d5-e1cd-4ab1-adc9-6217a29036ce/attachment/8c8cbc56-5681-436b-b974-d88bf9f373ea/media/process_count_graph.jpg

to:

http:/docs/process_count_graph.jpg

December 28, 2016, at 05:13 PM by 127.0.0.1 -
Changed line 32 from:
to:
  • The command Help Information is very useful so here is a link to nmon -h output: nmon -h output
Changed line 924 from:

https://www.ibm.com/developerworks/community/wikis/form/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/994db5d5-e1cd-4ab1-adc9-6217a29036ce/attachment/8c8cbc56-5681-436b-b974-d88bf9f373ea/media/process_count_graph.jpg

to:

http://www.ibm.com/developerworks/community/wikis/form/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/994db5d5-e1cd-4ab1-adc9-6217a29036ce/attachment/8c8cbc56-5681-436b-b974-d88bf9f373ea/media/process_count_graph.jpg

December 28, 2016, at 05:07 PM by 127.0.0.1 -
Changed lines 17-20 from:
  1. Online on-screen use - https://www.youtube.com/watch?v=jH7TnnFDWVg
  2. Data capture to a file - https://www.youtube.com/watch?v=oX-buCI53LY
  3. Graphing with nmon Analyser (Excel) - https://www.youtube.com/watch?v=jH7TnnFDWVg
to:
  1. Online on-screen use - https://www.youtube.com/watch?v=jH7TnnFDWVg
  2. Data capture to a file - https://www.youtube.com/watch?v=oX-buCI53LY
  3. Graphing with nmon Analyser (Excel) - https://www.youtube.com/watch?v=jH7TnnFDWVg
Changed lines 23-25 from:
  1. Install, download and online on-screen - https://www.youtube.com/watch?v=prVzcj3vXNc
  2. Data Capture to file - https://www.youtube.com/watch?v=_PDAQLflfEc
  3. Graphing with nmonchart - https://www.youtube.com/watch?v=5P4neOqoCTo
to:
  1. Install, download and online on-screen - https://www.youtube.com/watch?v=prVzcj3vXNc
  2. Data Capture to file - https://www.youtube.com/watch?v=_PDAQLflfEc
  3. Graphing with nmonchart - https://www.youtube.com/watch?v=5P4neOqoCTo
Changed lines 28-32 from:
to:
  • If you don't have access to a machine to run nmon these might help
    • The Flash screen that you see when nmon starts up

XXX

December 28, 2016, at 04:44 PM by 127.0.0.1 -
December 28, 2016, at 04:43 PM by 127.0.0.1 -
Changed line 258 from:
  • Given me money (and I have no problem with this) or
to:
  • Give me money (and I have no problem with this) or
December 28, 2016, at 04:34 PM by 127.0.0.1 -
Changed lines 269-271 from:

If it is something fairly simple you could ask a question on the IBM Performance tools Forum: https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000749

to:

If it is something fairly simple you could ask a question on the IBM Performance Tools Forum: IBM Performance Tools Forum

Changed lines 778-780 from:
  • Read my AIXpert Blog Article at nmon Data Files: Are they a Security Risk?
to:
  • Read my AIXpert Blog Article at nmon Data Files Are they a Security Risk?
Changed line 976 from:
  • Also see my AIXpert Blog article - [[https://www.ibm.com/developerworks/community/blogs/aixpert/entry/nmon_and_External_Data_Collectors|nmon_and_External_Data_Collectors
to:
  • Also see my AIXpert Blog article - nmon_and_External_Data_Collectors
December 28, 2016, at 04:27 PM by 127.0.0.1 -
Changed line 343 from:
  • To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 binaries from the Roll Your Own Wiki page at ryo - and the adapt sample program: https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/Power%20Systems/page/Roll-Your-Own-Performance-Tool
to:
  • To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 binaries from the Roll Your Own Wiki page at ryo - and the adapt sample program: Roll-Your-Own Performance Tools
December 28, 2016, at 04:15 PM by 127.0.0.1 -
Changed lines 119-121 from:
to:
  1. How to limit top processes to certain commands?
Changed line 124 from:
  1. Selecting specific processes by command NMONCMD0, NMONCMD1, etc.
to:
Changed line 129 from:
  • have you read nmon -h outout,
to:
  • have you read nmon -h output,
Changed lines 134-137 from:
  1. nmon for Linux 16 - major user interface upgrade? https://www.ibm.com/developerworks/community/blogs/aixpert/entry/nmon_for_Linux_v16_New_Stats_On_screen_Facelift_more
to:
  1. nmon for Linux 16 - major user interface upgrade? nmon for Linux v16 New Stats On screen Facelift more
Changed lines 1323-1324 from:
  • The start nmon and it will just show you these commands
to:
  • Then start nmon and it will just show you just these commands on-screen or saving them to the nmon file.
Changed lines 1329-1330 from:
to:
  • Up to 64 commands in this list.
Changed line 1332 from:
  1. The command is only checked up to the characters you give it, so "or" will match "oracle" and "orifice" = limited wild cards feature!
to:
  1. The command name is only checked up to the characters you give, so "or" will match "oracle" and "orifice" = a sort of limited wild cards feature!
December 28, 2016, at 03:49 PM by 127.0.0.1 -
Deleted line 1259:
Added lines 1306-1330:

Question 70: How to limit top processes to certain commands?

  • If there are lots of processes running but you want to limit your monitoring to just a few commands of particular interest then you can do this in two ways for online and file capture modes. Note these are the program names and don't include the parameters.
  • Method 1: Using shell variables
    • There are 64 shell variables to use and set to the commands you want to monitor. Follow this simple example to monitor just ksh, vi and syncd commands:
    • Setting the commands of interest:
      export NMONCMD0=ksh
      export NMONCMD1=vi
      export NMONCMD2=syncd
      
    • The start nmon and it will just show you these commands
  • Method 2: Using nmon command line options
    • This involves using the -C option:
      nmon -C ksh:vi:syncd
      
  • Notes:
    1. The command is only checked up to the characters you give it, so "or" will match "oracle" and "orifice" = limited wild cards feature!
    2. If you are new to UNIX then also note that you use the "unset" command to remove this shell variable as in: unset NMONCMD0
December 28, 2016, at 03:39 PM by 127.0.0.1 -
Added line 122:
  1. Selecting specific processes by command NMONCMD0, NMONCMD1, etc.
December 28, 2016, at 03:30 PM by 127.0.0.1 -
Changed lines 120-123 from:

- To do

  1. Simple User Defined Disk Group example
  2. Automatic User Defined Disk Group "-g auto" to stop the disk/partition duplication?
to:

- To do:

Changed lines 122-126 from:
  1. How to report a problem? OS version, nmon version, the actual command ran, have you read nmon -h outout, the problem, have you tried something simpler, send me a saple file, of screen capture.
  2. nmon for Linux 16 - major user interface upgrade? https://www.ibm.com/developerworks/community/blogs/aixpert/entry/nmon_for_Linux_v16_New_Stats_On_screen_Facelift_more?lang=en
to:
  1. How to report a problem?
    • OS version,
    • nmon version,
    • the actual command ran,
    • have you read nmon -h outout,
    • the problem,
    • have you tried something simpler,
    • send me a sample file,
    • of screen capture.
  2. nmon for Linux 16 - major user interface upgrade? https://www.ibm.com/developerworks/community/blogs/aixpert/entry/nmon_for_Linux_v16_New_Stats_On_screen_Facelift_more
Changed lines 1203-1207 from:
  • The 64 groups and 512 disks is definitely correct as we have the code open source.
to:
  • The Linux equivalent of the Question 67 AIX examples file could be
    Linux-OS sda sdb
    backup sdd sde
    
  • The nmon limits of 64 groups and 512 disks is definitely correct as we have the code open source - I checked.
Changed line 1309 from:
      - - - F r e q u e n t l y  -  A s k e d  -  Q u e s t i o n s  -  E n d  - - -
to:
      - - - F r e q u e n t l y  -  A s k e d  -  Q u e s t i o n s  -  E n d  - - -
December 28, 2016, at 03:11 PM by 127.0.0.1 -
Added lines 1245-1247:
  • Note: It will leave the file auto in the current directory. If you start nmon again with a auto file present, it will overwrite the current file.
  • Note: That if you have User Defined Disk Groups switched on then you can also switch on the Extended Disk stats with -d - see the below question.
December 28, 2016, at 03:04 PM by 127.0.0.1 -
Changed lines 1230-1232 from:

├─sda1 8:1 1 172.8G 0 part / ├─sda2 8:2 1 1K 0 part └─sda5 8:5 1 32G 0 part [SWAP]

to:

+-sda1 8:1 1 172.8G 0 part / +-sda2 8:2 1 1K 0 part \-sda5 8:5 1 32G 0 part [SWAP]

Deleted line 1233:
December 28, 2016, at 03:03 PM by 127.0.0.1 -
Changed lines 1230-1232 from:

‚‚sda1 8:1 1 172.8G 0 part / ‚‚sda2 8:2 1 1K 0 part ‚‚sda5 8:5 1 32G 0 part [SWAP]

to:

├─sda1 8:1 1 172.8G 0 part / ├─sda2 8:2 1 1K 0 part └─sda5 8:5 1 32G 0 part [SWAP]

Added line 1234:
December 28, 2016, at 03:00 PM by 127.0.0.1 -
Changed lines 1206-1208 from:
  1. We still have one problem: the disk I/O is reported against a partition (like sda1) AND the disk (like sda) which results in duplication.
  • if you want to see yout disk + partitions a good command is lsblk (I think this means list block devices.
to:
  1. We still have one problem: the disk I/O is reported against a partition (like sda1) AND the disk (like sda) which results in duplication.
  • if you want to see your disk + partitions a good command is lsblk (I think this means list block devices.
Changed line 1213 from:

If is was just the case of sda and sda1 duplication we could code in ignoring the sda1 but I have seen disks with all sorts of names, disk partition naming conventions as letters and numbers or mixed, in different orders and names including punctuation like &*%$£"!?@~

to:
  • If is was just the case of sda and sda1 duplication we could code in ignoring the sda1 but I have seen disks with all sorts of names, disk partition naming conventions as letters and numbers or mixed, in different orders and names including punctuation like &*%$£"!?@~
Changed lines 1215-1218 from:

Its better to live with duplicates that can be explained as nmon is just the messenger and if you don't like it talk to the Linux Kernel developers.

to:
  • Its better to live with duplicates that can be explained as nmon is just the messenger and if you don't like it talk to the Linux Kernel developers.
  • However, I relented in the end.
    • With the new 'lsblk command that defines very clearly: What is a disk and What is a partition we have off-loaded the decision.
    • By using the User Defined Disk Groups feature we can define the included disks and what is ignored.
  • To switch on this ignoring of the partitions and thus deduplicate feature use the -g auto option
    • nmon before it really starts takes the output of the lsblk command, ignores partitions and generates User Defined Disk Groups file called 'auto and uses that for the disk groups.
    • The actual command run depend on the Linux version:
    1. lsblk --nodeps --output NAME --noheadings
    2. lsblk --nodeps --output NAME,TYPE --raw
    • Here is some example output
      $ lsblk
      NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
      sda      8:0    1 204.8G  0 disk
      ‚‚sda1   8:1    1 172.8G  0 part /
      ‚‚sda2   8:2    1     1K  0 part
      ‚‚sda5   8:5    1    32G  0 part [SWAP]
      sr0     11:0    1  1024M  0 rom
      
      $ lsblk --nodeps --output NAME,TYPE --raw
      NAME TYPE
      sda disk
      sr0 rom
      
      $ lsblk --nodeps --output NAME --noheadings
      sda
      sr0
      $
      
Changed line 1250 from:
  • A decade ago I realised that there are
to:
  • A decade ago I realised that there are
Changed line 1256 from:
  • DISKBUSY highlights which disks are slowing you dowm,
to:
  • DISKBUSY highlights which disks are slowing you down,
Changed lines 1293-1297 from:
  1. DISKWAIT,Disk Wait Queue Time msec/xfer blue,usbms0,hdisk7,hdisk6,hdisk5,hdisk4
      - - - F r e q u e n t l y   A s k e d   Q u e s t i o n s   E n d  - - -
to:
  1. DISKWAIT,Disk Wait Queue Time msec/xfer blue,usbms0,hdisk7,hdisk6,hdisk5,hdisk4
      - - - F r e q u e n t l y  -  A s k e d  -  Q u e s t i o n s  -  E n d  - - -
December 28, 2016, at 02:33 PM by 127.0.0.1 -
Added lines 1197-1215:
  • The above applies to nmon on Linux too
  • The 64 groups and 512 disks is definitely correct as we have the code open source.
  • Of course, the disks are named very differently - typically sda, sdb, sdc etc. (not hdisk0, hdisk1 etc that we find in AIX).
  • However, Linux comes with its own problems one is historic but still true for PC size machines.
    • This is that the disks are often partitioned - for example /dev/sda1 is the first (1) disk partition on the first disk (a).
    • But that is not the problem. In the /proc file-system (where nmon gets the disk stats) there are multiple problems:
    1. IN the past there were different files in different formats which made developing nmon unnecessarily very hard work.
      • Fortunately all current Linux Distrobutions have moved to /proc/diskstats
    2. We still have one problem: the disk I/O is reported against a partition (like sda1) AND the disk (like sda) which results in duplication.
  • if you want to see yout disk + partitions a good command is lsblk (I think this means list block devices.
  • The result is nmon reports double or even triple the disk I/O stats.
  • Why don't we just code around this problem?
  • nmon rule: Don't hid (or remove) data because if the code is wrong or the data format changes in the future that would driver the nmon user mad!!!
  • I should add: or some device driver developer use truly bonkers disks names.

If is was just the case of sda and sda1 duplication we could code in ignoring the sda1 but I have seen disks with all sorts of names, disk partition naming conventions as letters and numbers or mixed, in different orders and names including punctuation like &*%$£"!?@~

  • That makes removing the duplicates very error prone and as the nmon developer I don't want to take the blame of some massive performance problem that I made invisible by removing the wrong data.

Its better to live with duplicates that can be explained as nmon is just the messenger and if you don't like it talk to the Linux Kernel developers.

December 28, 2016, at 02:11 PM by 127.0.0.1 -
Changed lines 1140-1152 from:
to:

This feature is covered in the nmon -h output as follows

  •         -g <filename> User decided Disk Groups
                          - file = on each line: group_name <hdisk_list> space separated
                          - like: rootvg hdisk0 hdisk1 hdisk2
                          - upto 32 groups hdisks can appear more than once
    
    
  • I think this limit is actually 64 disk groups and each can have 512 disks but don't quote me on that - I don't have the source code any longer.
  • If you want the name to appear nicely on-screen then keep the name below 12 characters.
Changed lines 1170-1176 from:

┌─topas_nmon─────────────────────Host=blue───────────Refresh=2 secs───13:55.47──────┐ │ Disk-Group-I/O ───────────────────────────────────────────────────────────────────│ │Name Disks AvgBusy Read|Write-KB/s TotalMB/s xfers/s BlockSizeKB │ │rootvg 2 0.0% 0.0|0.0 0.0 0.0 0.0 │ │backup 1 99.5% 71687.5|71677.5 140.0 844.0 169.9 │ │Groups= 2 TOTALS 3 33.2% 71687.5|71677.5 140.0 844.0 │ │───────────────────────────────────────────────────────────────────────────────────│

to:

+--Disk-Group-I/O-------------------------------------------------------------------+ |Name Disks AvgBusy Read|Write-KB/s TotalMB/s xfers/s BlockSizeKB | |rootvg 2 0.0% 0.0|0.0 0.0 0.0 0.0 | |backup 1 99.5% 71687.5|71677.5 140.0 844.0 169.9 | |Groups= 2 TOTALS 3 33.2% 71687.5|71677.5 140.0 844.0 | +-----------------------------------------------------------------------------------+

December 28, 2016, at 02:02 PM by 127.0.0.1 -
Added lines 115-117:
  1. What are User Defined Disk Groups for?
  2. Using User Defined Disk Groups with nmon for AIX?
  3. Using User Defined Disk Groups with nmon for Linux?
Changed lines 1125-1127 from:

Question 65: How do I get more disk stats because I can never get enough of these?

to:

Question 66: What are User Defined Disk Groups for?

Here are a few good use cases for this nmon feature that is covered in more details in the following three questions:

  1. Servers with 100's or 1000's of disks are very difficult to monitor on screen.
    • Unless you have a screen that can display 100's of lines!
    • You have reduce the disks on-screen graphs to just tiny font size and use a modern HD screen but there are limits.
  2. Servers with 100's or 1000's of disks are very difficult to graph later.
    • One extreme case with 4000 disks produced a black oblong because there was so many lines.
    • They complained that they could not see the details so the disks were unmanageable. They are correct - the problem was their default LUN size on the Fibre-Channel disks was ridiculously small but this is not nmon's fault. It was set-in-stone, out of date systems management practices.
  3. With many disks with the same data it is useful to group the disks together and then see the total I/O to that group of disks.
    • For example: the disks that make up a RDBMS data, RDBMS index and RDBMS logs - each should have different I/O characteristics in RW ratio, and block sizes.
    • For example: the disks used for backup, batch processing or background tasks like data arriving to be loaded in to a database - will be busy at different times.
  4. On AIX hdiskN and on Linux sdX are not helpful names while monitoring - changing the name to something meaningful aids comprehension
    • For example: rootvg, paging, webpages or rdbms_log immediately lets you know the data on the disk(s).

Question 67: Using User Defined Disk Groups with nmon for AIX?

  • To use User Defined Disk Groups you need to prepare a diskgroup to disks mapping file.
  • This is not complicated or hard to do - if you know your machine
  • This small text file has a simple format:
    • At the start of the line the diskgroupname followed by a list of hdiskN, all separated by a space characters.
    • For example: a file called diskgroups
      rootvg hdisk5 hdisk5
      backup hdisk7
      
  • you can call your fileanything you like but here we call it diskgroups
Then when you start nmon add at the end:
nmon -f . . .  -g diskgroups
  • If collecting data to a file you will find new data in the file on lines starting DG
    • The nmon Analyser and nmonchart both know what to do with this data.
  • If monitoring on-screen, to get the Disk Groups displayed, hit: g

┌─topas_nmon─────────────────────Host=blue───────────Refresh=2 secs───13:55.47──────┐ │ Disk-Group-I/O ───────────────────────────────────────────────────────────────────│ │Name Disks AvgBusy Read|Write-KB/s TotalMB/s xfers/s BlockSizeKB │ │rootvg 2 0.0% 0.0|0.0 0.0 0.0 0.0 │ │backup 1 99.5% 71687.5|71677.5 140.0 844.0 169.9 │ │Groups= 2 TOTALS 3 33.2% 71687.5|71677.5 140.0 844.0 │ │───────────────────────────────────────────────────────────────────────────────────│ @]

  • You can immediately see there is a backup running rather that one disk is unexplainably very busy and needs further investigation
  • Note you could make this more complex and even have a disk in more than one group line - a disk might be in the RDBMS Disk Group but also in the RDBMS_Logs disk group.
  • Here is a sample I used on a benchkmark:
    root hdisk0 hdisk1
    home hdisk2 hdisk3
    apps hdisk4 hdisk5 hdisk6
    data hdisk7 hdisk8 hdisk9 hdisk10 hdisk11 hdisk12 hdisk13 hdisk14
    index hdisk15 hdisk16 hdisk17 hdisk18 hdisk19 hdisk20 hdisk21 hdisk22
    archive hdisk23 hdisk24 hdisk25
    sort hdisk26 hdisk27 hdisk28 hdisk29 hdisk30
    logs hdisk31 hdisk32
    others hdisk33 hdisk34
    

Question 68: Using User Defined Disk Groups with nmon for Linux?

Question 69: How do I get more disk stats because I can never get enough of these?

December 28, 2016, at 12:40 PM by 127.0.0.1 -
Changed lines 1127-1130 from:
  1. far to many stats for disks available
  2. there are far to many disks - I have samples of servers with 1000's of disks (pretty dumb IMHO)
  • This could result in nmon becoming a disk stat collection tool with a minor extra data covering CPU and memory!!
to:
  1. far to many stats available for disks - this is due to "old school thinking: that its always the disks causing problems"
  2. there are far to many disks - I have samples of servers with 1000's of disks (pretty dumb IMHO as you can't graph them all)
  • This could result in nmon becoming a disk stat collection tool with minor extra data covering CPU and memory on the side!!
Changed line 1136 from:
  • I think that is enough to workout most disk problems.
to:
  • I think that is enough to workout most disk problems
December 28, 2016, at 12:37 PM by 127.0.0.1 -
Changed lines 115-116 from:
to:
  1. How do I get more disk stats because I can never get enough of these?
Added lines 1120-1170:

Question 65: How do I get more disk stats because I can never get enough of these?

  • A decade ago I realised that there are
    1. far to many stats for disks available
    2. there are far to many disks - I have samples of servers with 1000's of disks (pretty dumb IMHO)
  • This could result in nmon becoming a disk stat collection tool with a minor extra data covering CPU and memory!!
  • We have
    • DISKBUSY highlights which disks are slowing you dowm,
    • DISKREAD and DISKWRITE highlights how much data you are shifting,
    • DISKXFER highlight is you are approaching the Disk seek limits and adapter operation limits and
    • DISKBSIZE highlights if your application is doing silly small boxes.
  • I think that is enough to workout most disk problems.

nmon for Linux

  • nmon for Linux already has what you ask for but it does come at a price for servers with large numbers of disks in the output file size.
  • You can switch on extended disk stats with (for example): nmon -f -s10 -c 600 -g auto -D
  • Its the -g auto -D that is important.
  • You could use your own User Defined Disk Groups file for the -g option but "auto" generates this for you and strips outpu the disk partition duplication.
  • This adds for my simple two disk (it is RAID5-ed) server called violet the following stats:
    1. DGBUSY,Disk Group Busy violet,sda,sdb
    2. DGREAD,Disk Group Read KB/s violet,sda,sdb
    3. DGWRITE,Disk Group Write KB/s violet,sda,sdb
    4. DGSIZE,Disk Group Block Size KB violet,sda,sdb
    5. DGXFER,Disk Group Transfers/s violet,sda,sdb
    6. DGREADS,Disk Group read/s violet,sda,sdb
    7. DGREADMERGE,Disk Group merged read/s violet,sda,sdb
    8. DGREADSERV,Disk Group read service time (SUM ms) violet,sda,sdb
    9. DGWRITES,Disk Group write/s violet,sda,sdb
    10. . DGWRITEMERGE,Disk Group merged write/s violet,sda,sdb
    11. DGWRITESERV,Disk Group write service time (SUM ms) violet,sda,sdb
    12. DGINFLIGHT,Disk Group in flight IO violet,sda,sdb
    13. DGIOTIME,Disk Group time spent for IO (ms) violet,sda,sdb

1## DGBACKLOG,Disk Group Backlog time (ms) violet,sda,sdb

I hope the names are clear enough for you to understand the meaning.

nmon for AIX

  • Warning these flags are different for AIX
    • -D switches off the disk configuration collection at the start of the nmon files. This can be useful when you have 100's of disks as this config collection take take time (10's of seconds to a few minutes) and can hang if you have serious disk problems - in which case, don't go blaming nmon and go fix your disks. errpt is a good place to start.
    • -d switches on Disk Service time stats
  • For example on my four disk web server running AIX 7.2 I get these extra lines:
    1. DISKSERV,Disk Service Time msec/xfer blue,usbms0,hdisk7,hdisk6,hdisk5,hdisk4
    2. DISKREADSERV,Disk Read Service Time msec/xfer blue,usbms0,hdisk7,hdisk6,hdisk5,hdisk4
    3. DISKWRITESERV,Disk Write Service Time msec/xfer blue,usbms0,hdisk7,hdisk6,hdisk5,hdisk4
  1. DISKWAIT,Disk Wait Queue Time msec/xfer blue,usbms0,hdisk7,hdisk6,hdisk5,hdisk4
December 28, 2016, at 12:21 PM by 127.0.0.1 -
Changed line 492 from:
  • Use the -D flag to stop nmon collecting disk configuration each time can really helps to reduce the start up time.
to:
  • nmon for AIX only: Use the -D flag to stop nmon collecting disk configuration each time can really helps to reduce the start up time.
Changed lines 782-783 from:
  • If on AIX try an alternative command like topas -D - dDoes that give you the same stats ?
to:
  • If on AIX try an alternative command like topas -D
    • Does that give you the same stats ?
December 25, 2016, at 09:37 PM by 127.0.0.1 -
Changed lines 1103-1109 from:
  • Size,ResSet,ResText,ResData are the memory stats

Size,ResSet,ResText,ResData are the memory stats

  • Size is the program size as found on the file system file from which you start the program - this is fixed.
  • ResSet is the resident set size - this is the memory of the process running the program- it changes as the program runs (typically growing but can shrink) and it is partly shared across other processes running the same program.
  • ResTest is the resident set size of the code of the program (this is read-only so highly shared).
  • ResData is the resident set size of the date of the program (this is mostly read-write can be shared but on the first write to a memory page a copy is made for that particular process).
to:

Size, ResSet, ResText, ResData are the Memory stats

  • Size is the program size as found on the file system file from which you start the program - this is fixed.
  • ResSet is the resident set size - this is the memory of the process running the program- it changes as the program runs (typically growing but can shrink) and it is partly shared across other processes running the same program.
  • ResText is the resident set size of the code of the program (this is read-only so highly shared).
  • ResData is the resident set size of the date of the program (this is mostly read-write can be shared but on the first write to a memory page a copy is made for that particular process).
Changed lines 1111-1112 from:

If you want one number for the memory size of a process then use (ResTest + ResData) but note some of that memory is shared between processes.

to:

If you want one number for the memory size of a process then use (ResText + ResData) but note some of that memory is shared between processes.

Changed lines 1120-1122 from:
      - - - The End - - -
to:
      - - - F r e q u e n t l y   A s k e d   Q u e s t i o n s   E n d  - - -
December 25, 2016, at 09:33 PM by 127.0.0.1 -
Changed lines 114-115 from:
to:
  1. Please explain the TOP Process Memory stats?
Added lines 1092-1119:

Question 65: Please explain the TOP Process Memory stats?

Before answering I am going to assume you are aware there is no single number the tells you everything about the memory of a processes. This is because of many complications like programs share program code memory (one read-only copy for all processes running the same program) and partially share data (on a fork() the memory is shared with a Copy-On-Write flag to make different copies only if a page is written too) and then some of the program can be paged to/from disk or paged from file systems and some not exist in memory unless its updated (static data in the program file).

TOP process stats (switched on with - tot -T) have a header line describing the columns like this for Linux

  • TOP,+PID,Time,Usr,%Sys,Size,ResSet,ResText,ResData,ShdLib,MinorFault,MajorFault,Command,Threads,IOwaitTime

and like this for AIX

  • TOP,+PID,Time,Usr,RAM,Paging,Command,WLMclass
  • Size,ResSet,ResText,ResData are the memory stats

Size,ResSet,ResText,ResData are the memory stats

  • Size is the program size as found on the file system file from which you start the program - this is fixed.
  • ResSet is the resident set size - this is the memory of the process running the program- it changes as the program runs (typically growing but can shrink) and it is partly shared across other processes running the same program.
  • ResTest is the resident set size of the code of the program (this is read-only so highly shared).
  • ResData is the resident set size of the date of the program (this is mostly read-write can be shared but on the first write to a memory page a copy is made for that particular process).
  • To make life complex a typical C program will be links to shared libraries (the most common is the C Lib to support C library function and system calls). Typically, a minimum of around eight libraries but it could be 50+. Each library can have read-only code, read-only static data plus read-write memory which is partly shared.
  • now add in memory mapped files and you start to see its complicated.

If you want one number for the memory size of a process then use (ResTest + ResData) but note some of that memory is shared between processes.

nmonchart in it's TOP Process bubble chart reports the maximum value found in all the memory sizes reported for a particular process i.e ResText + ResData.

  • Note nmonchart assumes all the processes with the same name are the same. So for say Apache the processes with the name "httpd"
    • For CPU: all the CPU time is added together
    • For memory it takes the highest value of ResText + ResData across all the processes
    • For IO: all the I/O is added together
December 22, 2016, at 08:04 AM by 127.0.0.1 -
Added lines 117-118:
  1. Simple User Defined Disk Group example
  2. Automatic User Defined Disk Group "-g auto" to stop the disk/partition duplication?
Added line 121:
  1. nmon for Linux 16 - major user interface upgrade? https://www.ibm.com/developerworks/community/blogs/aixpert/entry/nmon_for_Linux_v16_New_Stats_On_screen_Facelift_more?lang=en
December 22, 2016, at 07:54 AM by 127.0.0.1 -
Changed lines 113-114 from:
to:
  1. How to determine optimal memory size for a VM from nmon data?
Changed line 1068 from:
  • Briefly the data that might be a risk:
to:
  • Briefly, the data that might be a risk:
Added lines 1078-1086:

Question 64: How to determine optimal memory size for a VM from nmon data?

  • Briefly, this is very hard to determine.
    • Completely unused memory can be "harvested".
    • You may or may not be able to reassign file-system cache memory
    • But if you are short on memory there can be latent demand that can't be predicted.
  • See my AIXpert Blog article for the full information - How to determine optimal memory size for a VM from nmon data
December 21, 2016, at 06:58 PM by 127.0.0.1 -
Changed lines 17-20 from:
  1. Online on-screen use https://www.youtube.com/watch?v=jH7TnnFDWVg
  2. Data capture to a file https://www.youtube.com/watch?v=oX-buCI53LY
  3. Graphing with nmon Analyser (Excel) https://www.youtube.com/watch?v=jH7TnnFDWVg
to:
  1. Online on-screen use - https://www.youtube.com/watch?v=jH7TnnFDWVg
  2. Data capture to a file - https://www.youtube.com/watch?v=oX-buCI53LY
  3. Graphing with nmon Analyser (Excel) - https://www.youtube.com/watch?v=jH7TnnFDWVg
Changed lines 23-25 from:
  1. Install, download and online on-screen' https://www.youtube.com/watch?v=prVzcj3vXNc
  2. Data Capture to file https://www.youtube.com/watch?v=_PDAQLflfEc
  3. Graphing with nmonchart https://www.youtube.com/watch?v=5P4neOqoCTo
to:
  1. Install, download and online on-screen - https://www.youtube.com/watch?v=prVzcj3vXNc
  2. Data Capture to file - https://www.youtube.com/watch?v=_PDAQLflfEc
  3. Graphing with nmonchart - https://www.youtube.com/watch?v=5P4neOqoCTo
December 21, 2016, at 06:57 PM by 127.0.0.1 -
Changed lines 17-20 from:
  1. Online use https://www.youtube.com/watch?v=jH7TnnFDWVg
  2. Data capture https://www.youtube.com/watch?v=oX-buCI53LY
  3. Graphing with nmon Analyser (Excel) https://www.youtube.com/watch?v=jH7TnnFDWVg
to:
  1. Online on-screen use https://www.youtube.com/watch?v=jH7TnnFDWVg
  2. Data capture to a file https://www.youtube.com/watch?v=oX-buCI53LY
  3. Graphing with nmon Analyser (Excel) https://www.youtube.com/watch?v=jH7TnnFDWVg
Changed lines 23-25 from:
  1. Install, download and online https://www.youtube.com/watch?v=prVzcj3vXNc
  2. Data Capture https://www.youtube.com/watch?v=_PDAQLflfEc
  3. Graphing with nmonchart https://www.youtube.com/watch?v=5P4neOqoCTo
to:
  1. Install, download and online on-screen' https://www.youtube.com/watch?v=prVzcj3vXNc
  2. Data Capture to file https://www.youtube.com/watch?v=_PDAQLflfEc
  3. Graphing with nmonchart https://www.youtube.com/watch?v=5P4neOqoCTo
December 21, 2016, at 06:56 PM by 127.0.0.1 -
Changed line 8 from:

Frequently Asked Question may be Answered by a Quick Video

to:

Frequently Asked Question may be Answered by a Quick Video

Changed line 30 from:

Frequently Asked Question

to:

Frequently Asked Question

December 21, 2016, at 06:55 PM by 127.0.0.1 -
Added lines 7-9:

Frequently Asked Question may be Answered by a Quick Video


Added lines 29-31:

Frequently Asked Question


December 21, 2016, at 06:50 PM by 127.0.0.1 -
Changed line 9 from:

This is me on YouTube - for lots of POWER Chips, Power Systems machines, AIX, PowerVM, PowerVC, PowerSC, Linux on Power videos

to:

This is Nigel Griffiths' YouTube - for lots of POWER Chips, Power Systems machines, AIX, PowerVM, PowerVC, PowerSC, Linux on Power videos

Changed lines 106-107 from:
to:
  1. Is sharing nmon data capture file a possible security risk?
Changed lines 1011-1014 from:

Question 64: How do I use User Defined Disk Groups to monitor large numbers of disks in ESS disk ranks?

to:

Question 63: How do I use User Defined Disk Groups to monitor large numbers of disks in ESS disk ranks?

Added lines 1057-1069:

Question 63: Is sharing nmon data capture file a possible security risk?

  • Briefly the data that might be a risk:
    • nmon filename includes the hostname
    • network config - IP Address and hostname
    • File system mount point can include names pf products use like the RDBMS
    • System serial numbers
  • The paranoid could remove or change these with a script.
  • Nothing of a major risk here.
  • See my AIXpert Blog article for the full information - nmon data files are they a security risk
December 21, 2016, at 06:39 PM by 127.0.0.1 -
Changed lines 7-25 from:

The postings on this site solely reflect the personal views of the authors and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.

to:

Prefer to watch a YouTube Video from the nmon designer / developer?

This is me on YouTube - for lots of POWER Chips, Power Systems machines, AIX, PowerVM, PowerVC, PowerSC, Linux on Power videos

  • https://www.youtube.com/user/nigelargriffiths

nmon on AIX on Power

  • There are 3 videos for getting started nmon for AIX which take roughly 39 minutes in total
  1. Online use https://www.youtube.com/watch?v=jH7TnnFDWVg
  2. Data capture https://www.youtube.com/watch?v=oX-buCI53LY
  3. Graphing with nmon Analyser (Excel) https://www.youtube.com/watch?v=jH7TnnFDWVg

nmon for Linux on Power, x86/AMD64, mainframe, ARM.

  • There are 3.5 for getting started for nmon with Linux which take roughly 57 minutes in total
  1. Install, download and online https://www.youtube.com/watch?v=prVzcj3vXNc
  2. Data Capture https://www.youtube.com/watch?v=_PDAQLflfEc
  3. Graphing with nmonchart https://www.youtube.com/watch?v=5P4neOqoCTo
  • If you don't mind using mind using Excel also take a look and the nmon Analyser above
  • Ignore the videos which are more than 2 years old as they are based on older versions although they are still mostly true.
December 21, 2016, at 06:33 PM by 127.0.0.1 -
Changed lines 933-935 from:
to:
  • Also see my AIXpert Blog article - [[https://www.ibm.com/developerworks/community/blogs/aixpert/entry/nmon_and_External_Data_Collectors|nmon_and_External_Data_Collectors
Changed line 938 from:
  • This is a AIX feature. Work Load Management statistics are started with: W (upper-case) to see them. Note: AIX 433 does not support the gathering of WLM stats. Work Load Management - this is the major benefit of AIX and no charge too. I have written a white paper on this find it at: http://www.ibm.com/developerworks/aix/library/au-Practical_WLM.html If you use passive mode you can use WLM to find out which applications are taking the CPU, RAM and IO resources of the machine with zero overhead. I tested WLM and could not detect WLM taking any resources at all or at least below 0.25% of one CPU. nmon outputs
to:
  • This is a AIX feature. Work Load Management statistics are started with: W (upper-case) to see them. Note: AIX 433 does not support the gathering of WLM stats. Work Load Management - this is the major benefit of AIX and no charge too. I have written a white paper on this find it at: http://www.ibm.com/developerworks/aix/]/au-Practical_WLM.html If you use passive mode you can use WLM to find out which applications are taking the CPU, RAM and IO resources of the machine with zero overhead. I tested WLM and could not detect WLM taking any resources at all or at least below 0.25% of one CPU. nmon outputs
Added lines 991-1043:

Question 64: How do I use User Defined Disk Groups to monitor large numbers of disks in ESS disk ranks?

  • On a recent benchmark with 3 x ESS = 1024 disks it became impossible to monitor them to ensure balanced I/O loading. So this was developed. The idea is to merge the disks into sets and monitor the sets. It is like the adapter stats but you get to choose which disks go into which set (adapter). Three obviously ways of doing this are by the:
    • disk use = group disks that have common data for example a databases data, index, sort, logs, archive = 5 disk groups
    • disk placement = the disks in a particular rack/drawer for example ESS, cluster, rank, loop - makes 8 groups per ESS
    • disk type or volume group/logical volume
    • Or any thing else you think up.
  • To set this up create a file with:
    • one line per disk group
    • starting with the name of the group
    • then a list of hdisks
    • all space separated
  • Then start nmon with the following option: -g filename
    • If online hit: g
    • If saving to a file there will be more sections for diskgroups = DGxxxx. The nmon analyser understands these new sections thanks to Stephen Atkins its developer.
  • Here are a few examples:
    • For my ESS placement disk groups I used the following script (this assumes you have the lsess command installed):
  • Creating the ESS disk group file example
    • Creating the ESS disk group file example
    • FILE1=/tmp/lsess_arary.tmp1
      FILE2=/tmp/lsess_arary.tmp2
      lsess >$FILE1
      grep hdisk $FILE1 | grep -v "not ready" | awk '{ print $3 }' | cut -b 4-8 | sort | uniq >$FILE2
      for j in `cat $FILE2`
      do
      	for i in 1100 1101 1300 1301 1500 1501 1700 1701 1000 1001 1200 1201 1400 1401 1600 1601
      	do
      		echo "ESS${j}_${i} \c"
      		grep hdisk $FILE1 | grep $j | grep ${i} | awk '{ printf " " $1 }'
      		echo
      	done
      done
      rm $FILE1 $FILE2
      exit
      
    • and generated the following disk group file:
    • Generated file:
      
      array_1100  hdisk44 hdisk45 hdisk46 hdisk47 hdisk48 hdisk49 hdisk50 hdisk51
      array_1101  hdisk52 hdisk53 hdisk54 hdisk55 hdisk56 hdisk57 hdisk58 hdisk59
      array_1300  hdisk60 hdisk61 hdisk62 hdisk63 hdisk64 hdisk65 hdisk66 hdisk67
      array_1301  hdisk68 hdisk69 hdisk70 hdisk71 hdisk72 hdisk73 hdisk74 hdisk75
      		... etc.
      
December 21, 2016, at 06:19 PM by 127.0.0.1 -
Changed lines 87-89 from:
to:
  1. Can I reset the peak counters for disks, network, AIO (AIX only) and CPU graphs online?
Deleted lines 967-968:
Added lines 983-990:

Question 62: Can I reset the peak counters for disks, network, AIO (AIX only) and CPU graphs online?

  • Network, Disks stats (not graphs) hit D (upper case d),AIO statistics track the peak values and display them. Also the CPU graphs provide peak indicator.
  • These can all be reset to zero by typing 0 (zero).
December 21, 2016, at 03:33 PM by 127.0.0.1 -
Added line 84:
Changed lines 86-89 from:
to:
  1. How to start nmon file collection with cron?
Added lines 967-984:

Question 61: How to start nmon file collection with cron?

  • The nmon default capture to file filenames has bee carefully chosen. If you save the output of many machines and captures in one directory and list the directory you will have the files in first machine hostname order and second orders by time (and date). This is a sensible ordering. Many people have written scripts to start nmon via cron and many of the scripts are a complete waste of time or even wrong. One feature that was added to nmon to make this easy was the -m flag so the nmon moves to a particular directory before saving data.
  • So here is what I put in my crontab (use crontab -e to add tasks to your crontab file). This collects the data once a day in the directory /home/nmon_data at once every 5 minutes and with 288 snapshot which makes a excellent graph detail level. It also collects top processes and user command lines (T), NFS stats (N), Workload Manager but no Subclasses (W), Large page stats (L) and Asynchronous I/O details. The reporting threshold is 0.001 percent of a CPU.
    • cron entry example
    • 0 0 * * * /usr/lbin/nmon_aix53 -fTNWLA -I 0.001 -s 300 -c 288 -m /home/nmon_data
      
  • There is no need of any shell scripts to start this collection.
  • Note: that is you start two nmon processes running at the same time they will have the same filename. So if you want to, for example, collect details and summary stats start then one minute apart. So if I also wanted hourly statistics with less top process details a second crontab entry might be:
    • cron entry example
    • 2 0 * * * /usr/lbin/nmon_aix53 -ftNWLA -s 3600 -c 24 -m /home/nmon_data
      
  • Also note that only one of f, F, z, x or X should be used and it should be the first argument. You have been warned as not following this can cause confusion.
December 21, 2016, at 03:29 PM by 127.0.0.1 -
Changed lines 83-85 from:
to:
  1. How to use the AIX Workload Manager stats?
  2. How to use change the Top Processes Minimum CPU Threshold?
Added lines 873-874:

https://www.ibm.com/developerworks/community/wikis/form/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/994db5d5-e1cd-4ab1-adc9-6217a29036ce/attachment/8c8cbc56-5681-436b-b974-d88bf9f373ea/media/process_count_graph.jpg

Changed lines 930-963 from:
to:

Question 59: How to use the AIX Workload Manager Statistics?

  • This is a AIX feature. Work Load Management statistics are started with: W (upper-case) to see them. Note: AIX 433 does not support the gathering of WLM stats. Work Load Management - this is the major benefit of AIX and no charge too. I have written a white paper on this find it at: http://www.ibm.com/developerworks/aix/library/au-Practical_WLM.html If you use passive mode you can use WLM to find out which applications are taking the CPU, RAM and IO resources of the machine with zero overhead. I tested WLM and could not detect WLM taking any resources at all or at least below 0.25% of one CPU. nmon outputs
  • actual resource use percentage per class
  • desired percent AIX sets as a target based on active class shares and limits. These are worth watching as for example classes without processes get zero targets. See the Junk class in the example below.
  • share values (-1 means it is not set)
  • number of processes per class (try for zero in Default class)
  • class Inheritance and Shared Memory flags
  • Is there missing data you need? - remember things like min hard and soft are for CPU and RAM and Block IO and for each class there are limits to what we can output on the screen. The -S options allows you to see sub-classes but if you have lots they may not fit on the screen or over run the captured data file line length limit from Excel.
  • The nmon file capture records the full WLM details once (at the start) in the BBBP section but then only the actual resources used to reduce output. Online the output looks like this:
    • Online WLM example
    • Work Load Manager CPU MEM BIO  CPU MEM IO  CPU   MEM   BIO     Tier Inheritance
      Class Name       |---Used----||--Desired-||----Shares-----|Proc's T I Localshm
      Unclassified       0%  0%  0% 100 100 100    -1    -1    -1     1 0 0 0
      Unmanaged          0% 11%  0% 100  99 100    -1    -1    -1     1 0 0 0
      Default            0% 29%  0% 100  98 100    -1    -1    -1    34 0 0 0
      Shared             0% 21%  0% 100  98 100    -1    -1    -1     0 0 0 0
      System             0% 50%  0% 100  99 100    50    -1    -1    80 0 0 0
      database          72%  0%  0%  75 100 100   300    -1    -1     9 0 1 0
      batch             26%  0%  0%  25 100 100   100    -1    -1     4 0 1 0
      junk               0%  0%  0% 100 100 100   400    -1    -1     0 0 0 0
      

Question 60: How to use change the Top Processes Minimum CPU Threshold?

  • nmon will not save to file process using less than 0.1% of a CPU. This is to reduce the file output to useful information. But 0.1% of the fastest CPU is now quite a lot of CPU power, so the threshold is now changeable using the -I option. This was requested by a nmon user as a useful idea. So add the following option when you start nmon:
    • -I <percent>
  • This sets the Ignore Process Percent threshold (default 0.1) i.e. don't save TOP stats if proc using less CPU than this percentage. Example:
    • nmon -f -s 10 -c 300 -I 0.01
  • This will mean a lot more top processes statistics will be gathered.
Changed line 968 from:

Take this link for historically old stuff Very Old nmon and Very Old AIX Questions

to:

Take this link for historically old stuff Very Old nmon and Very Old AIX Questions

December 21, 2016, at 03:04 PM by 127.0.0.1 -
Changed lines 812-813 from:

Question 56: How to use External Data Collectors with nmon?

to:

Question 57: How to use External Data Collectors with nmon?

Changed line 825 from:

[@

to:
  • [@
Changed line 837 from:

[@

to:
  • [@
Changed line 843 from:

[@

to:
  • [@
Changed line 848 from:

[@

to:
  • [@
Changed line 855 from:

[@

to:
  • [@
Changed lines 879-880 from:

Question 56: How to RDBMS Oracle Transaction Counters External Data Collectors Example?

to:

Question 58: How to RDBMS Oracle Transaction Counters External Data Collectors Example?

Changed line 884 from:

[@

to:
  • [@
Changed line 889 from:

[@

to:
  • [@
Changed line 909 from:

[@

to:
  • [@
December 21, 2016, at 03:01 PM by 127.0.0.1 -
Changed lines 81-82 from:
to:
  1. How to user an External Data Collector with nmon?
  2. How to RDBMS Oracle Transaction Counters External Data Collectors Example?
Added lines 810-924:

Question 56: How to use External Data Collectors with nmon?

  • The external data collectors feature is to get nmon to run other commands that you can then add to the nmon data file for analysis. A typical example is to collect DB2 or Oracle stats to compare against nmon data. You can run a command when:
    • nmon starts using the shell variable NMON_START
    • nmon ends using the shell variable NMON_END
    • each snap shot using the shell variable NMON_SNAP
    • a subset of snap shots using the shell variable NMON_ONE_IN
      • This is controlled by shell variables set before you run nmon. The separate file that the data collectors generate is merged into the nmon file before analysis with the cat command. You don't need to have all of these - i.e. could do start + end or just the snap shots or - a special start-up plus snap shots. This is a bit complex so here is a worked example.
    • First set the TIMESTAMP shell variable:
    • if TIMESTAMP = 0, then lines will have the classic nmon Tnnnn timestamps at the start of the line and work well with the nmon data file
    • if TIMESTAMP = 1, then lines will have a timestamp that has the hours, minutes, seconds and day, month, year - this can be used if you don't want to merge the data with the nmon file for analysis.
  • Setting the shell variables
export TIMESTAMP=0
export NMON_START="mystart"
export NMON_SNAP="mysnap"
export NMON_END="myend"
export NMON_ONE_IN=1        # 1 is the default
  • We set the above shell variables, so they refer to a program or shell script
  • If the mystart, myend, mysnap contain the following shell scripts
    • mystart
ps -ef >start_ps.txt
echo "PROCCOUNT,Process Count, Procs" >ps.csv
  • mysnap
cho PROCCOUNT,$1,`ps -ef | wc -l` >>ps.csv
  • myend
ps -ef >end_ps.txt
  • Now run nmon as normal, for example: nmon -f -s 2 -c 10
  • At the end of the capture, the ps.csv file might contain (for example):
PROCCOUNT,T0001,56
PROCCOUNT,T0002,58
PROCCOUNT,T0003,67
PROCCOUNT,T0004,65
PROCCOUNT,T0005,71
PROCCOUNT,T0006,68
PROCCOUNT,T0007,66
PROCCOUNT,T0008,58
PROCCOUNT,T0009,57
PROCCOUNT,T0010,60
  • The start_ps.txt and end_ps.txt files would have a list of running processes at the time. The ps.csv file can be merged with the nmon output file (below called this_050607_0916.nmon, yes my machine is called "this") after nmon finishes with the following command:
    • cat this_050607_0916.nmon ps.csv >combined.csv
  • Then run the nmon Analyser on the combined file - if you are lucky, the analyser may draw you a graph. Here is what was produced:
  • Hints:
    • comma separate the data and don't go over 2K bytes in line length
    • make the important data in the first couple of columns.
    • keep the stats in the same range - i.e. all KB/s or all percentages
  • If you set the NMON_ONE_IN variable you can also run the NMON_SNAP command less often!!
  • By default this is set to 1 - run it every time - but if the command you want to capture is heavy in CPU terms or takes a long elapsed time to finish. You can run it less often. For example to run in just one in ten snapshots: export NMON_ONE_IN=10

Question 56: How to RDBMS Oracle Transaction Counters External Data Collectors Example?

  • Here is another example collecting transaction commits and rollback statistics from the Oracle database using two scripts called oraclestart and oraclesnap that run an SQL statement and save the data in a file called dbstats.csv:
  • oraclestart
echo "DATABASE,Transactions,commit,rollback" >dbstats.csv
]@

* oraclesnap
[@
export ORACLE_SID=MYDATABASE
( sqlplus -s "system/manager as sysdba" <<EOF
set heading off
set headsep off
set echo off
set lines 2000
set feedback off
set newpage none
set recsep off
select 'DATABASE,$1,'||
	sum(decode(name, 'user commits', value, 0))||','||
	sum(decode(name, 'user rollbacks', value, 0))
	from
		sys.v_\$sysstat;
EOF
) >> dbstats.csv
  • Setting up the shell variables
export TIMESTAMP=0
export NMON_START="oraclestart"
export NMON_SNAP="oraclesnap"
unset NMON_END
  • Now run nmon
  • You need to ensure the ORACLE_SID and usernames and password work in your environment. Do this by running the command manually with: * oraclesnap T9999
  • And checking the results in the file dbstats.csv
  • This should put one line in the file dbstats.csv. This script has to log on to the Oracle database each time it runs, so you should not be doing this every second as it will take elapsed time and CPU resources. But if you are collecting nmon data once a minute or more this overhead should be small.
  • Thanks to Ralf Schmidt-Dannert of the IBM SAP and Oracle Solutions team in Minneapolis, USA for this example.
  • One Caveat on External Data Collectors
    • The "T" or "t" as the first letter of the second column is used by tools to recognise the difference between new header lines of new data sections and the data lines (i.e. those containing the timestamp values for example, T0000, T0001, etc.) So do not use a header line like "PROCCOUNT,The Process Count, Procs" - the "T" in "The" will cause problems.
December 21, 2016, at 02:32 PM by 127.0.0.1 -
Added lines 86-87:
  1. How to report a problem? OS version, nmon version, the actual command ran, have you read nmon -h outout, the problem, have you tried something simpler, send me a saple file, of screen capture.
December 20, 2016, at 09:59 AM by 127.0.0.1 -
Added lines 81-82:
Deleted line 84:
  1. What version of nmon am I running?
December 20, 2016, at 09:58 AM by 127.0.0.1 -
Changed lines 18-19 from:

0 Which nmon version am I running?

to:
  1. Which nmon version am I running?
Changed lines 24-25 from:
  1. Can you add the monitoring tape drive on AIX?
to:
Changed line 80 from:
to:
  1. Can you add the monitoring tape drive on AIX?
Changed line 88 from:

Question 0: Which nmon version am I running?

to:

Question 1: Which nmon version am I running?

Changed lines 98-99 from:

Question 1: Which nmon for my version of AIX or Linux?

to:

Question 2: Which nmon for my version of AIX or Linux?

Changed lines 111-112 from:

Question 2: nmon crashes shortly after starting a data capture, please fix this send me the next version?

to:

Question 3: nmon crashes shortly after starting a data capture, please fix this send me the next version?

Changed line 118 from:

Question 3: Significant nmon dates?

to:

Question 4: Significant nmon dates?

Changed lines 138-139 from:

Question 4: All I get is "nmon not found"?

to:

Question 5: All I get is "nmon not found"?

Deleted lines 147-157:

Question 5: Can you add the monitoring tape drive on AIX?

  • AIX
    • No - the data is not available. The best you can do is to watch the disks and guess what the tape is doing. The adapter statistics is only adding up the attached disks - so it does not help. You can guess at the tape drive I/O rates by looking at the disk I/O rates - after all this is where the data is coming from but it is only approximate and does not account for memory caching of data.
    • Yes - if your tape drive is Fibre Channel connected it is very common to have it connected on a different FC adaapter to allow performance settings to suit the tape drive = streams of large blocks.
    • In this case, use the Adapter stats using the ^ key or -^ startup option to monitor the tape(s).
  • Linux
    • No FC Adapter options for Linux - unless you know the /proc file to find tape stats. In which case let Nigel know ASAP.
Added lines 795-806:

Question 56: Can you add the monitoring tape drive on AIX?

  • AIX
    • No - the data is not available. The best you can do is to watch the disks and guess what the tape is doing. The adapter statistics is only adding up the attached disks - so it does not help. You can guess at the tape drive I/O rates by looking at the disk I/O rates - after all this is where the data is coming from but it is only approximate and does not account for memory caching of data.
    • Yes - if your tape drive is Fibre Channel connected it is very common to have it connected on a different FC adaapter to allow performance settings to suit the tape drive = streams of large blocks.
    • In this case, use the Adapter stats using the ^ key or -^ startup option to monitor the tape(s).
  • Linux
    • No FC Adapter options for Linux - unless you know the /proc file to find tape stats. In which case let Nigel know ASAP.
December 20, 2016, at 09:56 AM by 127.0.0.1 -
Changed line 18 from:
to:

0 Which nmon version am I running?

Added lines 88-96:

Question 0: Which nmon version am I running?

  • AIX
    • nmon -v - Will just say TOPAS-NMON
    • lslpp -w /usr/bin/nmon - Details that nmon is in the bos.perf.tools package
    • lslpp -l bos.perf.tools - Details the version number of that package. Normally that is very similar to the AIX version: oslevel -s
  • Linux
    • nmon -V - Details the nmon version
    • The version it is also displayed at the top left when used interactively on a terminal.
December 20, 2016, at 09:45 AM by 127.0.0.1 -
Changed lines 79-80 from:
to:
  1. What files does nmon for Linux use to get its data?
Changed lines 84-87 from:
  1. What files does nmon for Linux use to get its data?
  2. Who does nmon for AIX extract its data?
to:
  1. How does nmon for AIX extract its data?
Added lines 759-795:

Question 55: What files does nmon for Linux use to get its data?

  1. nmon for LInux reads the test from the following files. Hopefuly the files names explain the data in each. If not go have a look.
  2. There is some information available: from the Linux manual: man 5 proc
    • but it is also vague and does not explain units or why the data is sometimes missing.
  3. Performance stats
    • /proc/cpuinfo
    • /proc/stat
    • /proc/version
    • /proc/meminfo
    • /proc/uptime
    • /proc/loadavg
    • /proc/net/rpc/nfs
    • /proc/net/rpc/nfsd
    • /proc/vmstat
    • /proc/ppc64/lparcfg - POWER systems only
    • /proc/net/rpc/nfs
    • /proc/net/rpc/nfsd
    • /proc/diskstats
    • /proc/partitions
    • /proc/net/dev
  4. Process stats where PID is replaced with the Process ID number in turn
    • /proc/PID/stat
    • /proc/PID/statm
    • /proc/PID/io
  5. Configuration data - includes the above in full text and then these too
    • /proc/device-tree/host-model
    • /proc/device-tree/host-serial
    • /proc/device-tree/ibm,partition-name
    • /proc/diskinfo
    • /proc/sysinfo
    • /proc/modules
  6. Some extra data it extracted using classic UNIX system calls like those to detail the file systems and mount points
December 19, 2016, at 03:21 PM by 127.0.0.1 -
Changed lines 760-761 from:
to:
      - - - The End - - -
Changed lines 764-895 from:

Very Old nmon and Very old AIX questions = Historic Interest Only



Very Old nmon and Very Old AIX Questions

OLD Question 1: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?

  • Hard luck
  • I will actively help get AIX 5 bugs fixed but older versions are very much less interesting.
  • In particular, on AIX 4.1.5 the TOP processes does not work but I am not going to fix it unless some one offers me hard currency

OLD Question 2: Can I get the adapters stats from other tools?

  • AIX
    • Not in AIX 4 - there are no adapter stats in this AIX.
    • This is now available in AIX 5 and higher via the libperfstat library so programmers can get this information - but a warning this is derived data from the connected disks (NOT tape drives) because there is no adapter stats. XXX

OLD Question 3: When I start nmon 9 on a system that it use to run fine I know get an error message?

  • The error is something about "lslpp" AIX 5.1 about ML03 onwards - or - WLM stats go missing - after upgrading to AIX 5.2 ML5 - can you fix nmon?
  • These are bugs in AIX and not nmon -there are fixes available.
  • Please report these problems to your AIX support channel and not me. nmon 10 has also been back ported to AIX 5.1 and AIX 5.2 and has code to work around these bugs and can be used instead of nmon9a.

OLD Question 4: Can you add the monitoring of process priority?

  • Available from the AIX 5.1 onwards
  • Always available in nmon for Linux

OLD Question 5: nmon on AIX, nmon 9 does not run, please fix?

  • With reports like:
  • read error: No such device or address
  • nmon file=nmon.c line=1278 version=XXX
  • In 95% of the time it is because AIX was upgraded or a maintenance level added but the AIX/system was not rebooted. It is very easy to miss the "You must reboot" message in the gallons of installp output. The reboot is required because the AIX kernel image has been updated and the reboot is the only way to activate the new /unix file. nmon reads the /unix file to find kernel data structure addresses but if the /unix file does no match what is actually running, you get this message.

You can also get really weird effects, if you have messed up LIBPATH.

OLD Question 6: Old nmon version question: nmon and AIX commands do not agree?

  • A lot of this happens with nmon 10 and the Shared Processor Logical Partitions (SPLPAR) - what marketing calls Micro-partition.
  • Some of it is because the AIX commands are very unclear about what they are reporting.
  • What was CPU numbers can now be physical CPU, Logical CPU or Virtual CPU numbers and the documentation is unclear.
  • So you may not be comparing "like with like". This has been improved in nmon 11 - please report further issues from nmon 11 onwards.
  • also see question 26.

OLD Question 7: Old nmon for AIX: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?

  • AIX
    • Correct, this data is not available on AIX 5.1 from the libperfstat library.
    • This also causes a problem on nmon2rrd version 10 where it expects the IOADAPT section and crashes.
    • Recommended action upgrade AIX as 5.1 is not supported without purchasing extended support.

OLD Question 8: nmon for AIX will not start on AIX 5.1 due to a libperfstat error?

  • The error is something like:
    • exec(): 0509-036 Cannot load program <nmon binary file here> because of the following errors:
    • 0509-150 Dependent module libperfstat.a(shr.o) could not be loaded.
    • 0509-022 Cannot load module libperfstat.a(shr.o).
    • 0509-026 System error: A file or directory in the path name does not exist.
  • You will need to have installed the libperfstat library from the AIX CDROMs.
  • This is in bos.perf.libperfstat package.
  • I hope you realise that AIX 5.1 is not normally supported as it is so old.

OLD Question 9: Old nmon version: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"

  • This has been reported shortly after an upgrade to a AIX 5.3 higher ML (like ML5 or ML6) and reboot.
  • After a lot of research and experiments the following was found by a persistent nmon user called Xi Chen.
  • The problem seems to be nmon jumping to a library like libperfstat and the jump vectors are not right so the library/system call jumps to address zero and attempts to execute instruction zero (invalid, of course).
  • This is a bug in AIX and its update process where the libperfstat kernel package does not match the library.
  • Try the following command: # lslpp -L | grep -i perfstat
  • You may get something like:
# lslpp -L | grep -i perfstat
  bos.perf.libperfstat      5.3.0.50    C     F    Performance Statistics Library
  bos.perf.perfstat         5.3.0.60    C     F    Performance Statistics
  • Update the package bos.perf.libperfstat to the same (5.3.0.60) or at least much closer levels (like 5.3.0.60 and 5.3.0.61) as bos.perf.perfstat. Preferably, the latest available levels.

OLD Question 10: Old AIX version: AIX 5.3 updated but then nmon gives "Assert Failure"

  • This has been reported shortly after an upgrade - some machines have this problems while others don't.
  • There does not seem to be a pattern. There has been a lot of investigation of this issue with tools being written but it is still a mystery.
  • The libperfstat library is claiming that an invalid parameter has been passed but tools have shown this is not true.
  • The three parameters are a pointer to memory (just malloc'ed in the code), the number of adapters (just returned by the previous call to libperfstat) and the size of the diskadapter structure (which has never changed). The output looks like this:
  • ERROR: Assert Failure in file="nmon11.c" in function="main" at line=3300
    ERROR: Reason=System call returned -1
    ERROR: Expression=[[perfstat_diskadapter((perfstat_id_t * )FIRST_DISKADAPTER, p->adapt, sizeof(perfstat_diskadapter_t), adapters)]]
    ERROR: errno=22
    ERROR: errno means : Invalid argument
    
  • Then it has been found that a reboot fixes most of these Assert Failures. We don't fully understand this but it may be adapters in funny states, or kernel modules need to be reloaded or libperfstat in a twist - one thing we do know - its not nmon! If you hit this problem:
  1. Check the software levels, see Question 53
  2. Do you think that you rebooted after the upgrade or do you know for absolutely sure!!
  3. Try: export NMON_IGNORE_ASSERT=1 and then start nmon from this same ksh. This may work around the problem as nmon bravely tries to carry on even with library errors.
  4. Try the latest beta version of nmon (if it supports your AIX level).
  5. I know rebooting can be a problem with production systems but it fixes this the vast majority of the time.
  6. If still its a problem, let us know via the usual AIX Performance Tools Forum.

OLD Question 11: Old AIX version:On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats?

  • This is yet another bug in the AIX libperfstat library at this ML6. The NFS data returned to nmon is corrupt and these characters may be output directly from the library (very bad form chaps!). The work around is:
  1. Do not include NFS statistics (remove the -N)
  2. Move to nmon12 that codes around these bugs.

OLD Question 12: Old AIX version: Why is the Process memory percentage zero? (same for System and User percent)

  • This seems to happen in AIX 5.3 TL07 or there about. In fact, it is the AIX libperfstat library, which nmon uses, that has a bug in it that returns a large negative number for the Process% value. The Process, System and User Percentages are approximations (remember memory has many modes, types and uses and some overlap) and the calculation goes wrong.
  • nmon reports this problem by showing 0% - which is clearly impossible.
  • The bug was very hard to reproduce and track down because the problem only happens in particular circumstances and changes in memory use (like starting and stopping large memory applications). I am pretty sure you have a good chance of the number being fixed (for at least some time but may reappear), if you reboot the machine/LPAR.
  • The fix is to update AIX to AIX 5.3 TL09 (or even better AIX 6) but there may be a PTF or efix. You will have to ask AIX Support by asking for a fix to the libperfstat library to fix the real_system, real_process and real_user members of the perfstat_memory_total_t structure. That will give them the right details to search for in the Retain database. Do not ask for nmon classic support as the answer could be short and/or rude!
  • In my experience AIX systems administrators don't like adding these updates to a production machine. So it may be better to just accept that if any of these numbers are zero then do not use any of these percentages.
 - - - The End - - -
to:

Take this link for historically old stuff Very Old nmon and Very Old AIX Questions


December 19, 2016, at 03:19 PM by 127.0.0.1 -
Changed line 767 from:

[Very Old nmon and Very Old AIX Questions]

to:
December 19, 2016, at 03:18 PM by 127.0.0.1 -
Added line 767:

[Very Old nmon and Very Old AIX Questions]

December 16, 2016, at 09:45 AM by 127.0.0.1 -
Changed line 41 from:
  1. On AIX the disk adapter are wrong?
to:
  1. On AIX the disk adapters are wrong?
Changed lines 78-86 from:
to:
  1. The Disk stats are far too high or 100%, nmon is broken?

- To do

  1. What version of nmon am I running?
  2. What files does nmon for Linux use to get its data?
  3. Who does nmon for AIX extract its data?
Added lines 728-761:

Question 54: The Disk stats are far too high or 100%, nmon is broken?

  • Check that your OS is at the current level
    • on Linux upgrade it and
    • on AIX run oslevel -s (last four digits are the year (16 = 2016) and week number). If you AIX is two years out of date you badly need to update.
  • Just because a disk is 100% busy it does not necessarily mean you have a performance issue.
    • Perhaps you really are reading a large file(s)!
    • If your CPUs deal with the data faster than the disk can deliver it then you get a 100% busy disk.
  • What happens when you just look at the stat numbers ?
    • nmon
    • Then D
  • If on AIX try an alternative command like topas -D - dDoes that give you the same stats ?
  • If on Linux, if necessary, install sysstat and check the iostat output
  • If they show the same sorts of disk I/O numbers then it is very likely you are doing the I/O.
  • Look at top processes which are doing lots of I/O with: nmon
    • Then t 5
  • For Linux: If it still looks wrong report the problem at the DeveloperWorks Performance Tools Forum - or email me if you can work out the email address
  • For AIX, If you think you have a performance issue or a nmon issue
    1. create a snap
    2. capture the workload: perfpmr
    3. raise a PMR
    4. Note AIX Support is not there to analyse your nmon data and graphs for comment. Just like they are not there to correct your spelling for file created by vi!
Added line 763:

Added line 765:

December 16, 2016, at 09:26 AM by 127.0.0.1 -
Changed line 30 from:
  1. Question 10: Why do you support all these old unsupported AIX versions?
to:
  1. Why do you support all these old unsupported AIX versions?
Changed lines 38-40 from:
  1. NANQ and INF?
  2. nmon reports more than 100% for a process - clearly it is wrong?
to:
  1. What are NANQ and INF?
  2. nmon reports more than 100% for a process - clearly it is wrong?
Changed line 71 from:
  1. Why isn't nmon for Linux on the Distro media or online repository or it is but out of date?
to:
  1. Why isn't nmon for Linux on the Distro media or online repository or it is there but out of date?
Changed lines 636-637 from:

Question 47: nmon will no stay running - What should I check?

to:

Question 47: nmon will not stay running - What should I check?

Changed line 667 from:

Question 48: Why isn't nmon for Linux on the Distro media or online repository or it is but out of date?

to:

Question 48: Why isn't nmon for Linux on the Distro media or online repository or it is there but out of date?

December 16, 2016, at 09:22 AM by 127.0.0.1 -
Changed lines 25-26 from:
  1. What is the most reported error for nmon?
to:
  1. What is the most reported errors for nmon?
Changed lines 141-143 from:

Question 6: What is the most reported error for nmon?

  1. nmon crashes have it starts in collecting to a file mode.
to:

Question 6: What is the most reported errors for nmon?

  1. nmon crashes as it starts in collecting to a file mode.
Changed lines 146-147 from:
  • Quite often the nmon output file is empty due to not waiting long enough - if you request data every 5 minutes with 16 minutes before you try analysing the file!
  • Incomplete if nmon is still running andoutputing data you can grab the file and not have a complete resord right at the end - you could edit with vi to remove the last set of output - see the lines starting ZZZ
to:
  • Quite often the nmon output file is empty or only has the config info due to not waiting long enough - if you request data every 5 minutes then wait 16 minutes (three snapshots of performance data) before you try analysing the file!
  • Incomplete last line. if nmon is still running and outputting data and you can grab the file it is possible to have an incomplete last line of the file - you could edit with vi to remove the last set of output - see the lines starting ZZZ.
Changed line 149 from:
  • But it turns out its all ready implemented (and has been for a few years)
to:
  • But it turns out XYZ is already implemented (and has been for a few years)
Changed line 152 from:
  • Turns out they can't read nmon -h output which states: The -f or -F MUST be the first option on the line
to:
  • Turns out the user can't read the nmon -h output which states: The -f or -F MUST be the first option on the line
Changed line 154 from:
  • First do your home work by learn UNIX and Linux performance statistics: read the command manuals, take a course or spend 5 years benchmarking
to:
  • First do your home work by learn UNIX and Linux performance statistics: read the command manuals, take a course or spend 5 years in a benchmark centre.
Changed lines 156-160 from:
  1. The AIX and Linux memory stats are different?
    • The answer is "Yes you are correct". Some of the basics map OK like memory total size and memory free but the bulk are very different.
    • Also note that early Linux on Intel/AMD had to cope with small memory size with high and low memory areas. This has died out now with the move to 64 bit memory addressing.
    • They are very different and it not me forgetting to implement some of the stats.
    • For example the AIX NEWMEM starts are not available under Linux and never will be.
to:
  1. The AIX and Linux memory stats are different or missing?
    • The answer is "Yes you are correct". Some of the basic memory stats map OK between AIX and Linux for example: memory total size and memory free but the bulk are very different.
    • Also note that early Linux on Intel/AMD had to cope with small memory size with high and low memory areas due to 16 bit then 32 bit hardware. This has died out now with the move to 64 bit memory addressing.
    • Linux and AIX are very different in the memory area and it not me forgetting to implement some of the stats.
    • For example, the AIX NEWMEM starts are not available under Linux and never will be.
December 09, 2016, at 12:37 PM by 127.0.0.1 -
Changed lines 77-78 from:
to:
  1. Sharing nmon files - Are they a security risk?
Added lines 709-719:

Question 53: Sharing nmon files - Are they a security risk?

  • If you are worried Remove the following:
    • Hostname - Which you can simply alter with with vi or sed
    • IP address - Unlikely to be directory on the Internet anyway
    • Some processes names show software that you are running - You may not want others to know what you use!
    • Same for some file system mount points.
    • Machine serial numbers - IBM does not recommend making these public
    • Old Machine types, firmware levels and OS level - Could be embarrassing!!
  • Are the files a risk? Nothing here that helps a hacker.
  • Read my AIXpert Blog Article at nmon Data Files: Are they a Security Risk?
December 09, 2016, at 09:51 AM by 127.0.0.1 -
Changed lines 76-78 from:
to:
  1. Adding External Data Collectors to nmon files so it graphs your extra data ?
Changed lines 701-707 from:
to:

Question 52: Adding External Data Collectors to nmon files so it graphs your extra data?

  • So you have extra data you want nmon to collect and add to the nmon file for graphing
  • nmon will help you collect the data at the right intervals and in the right format
  • So the data can simply be added to the end of the nmon file and graphed
  • The nmon Analyser is pretty good at graphing "unexpected extra data" provided the number are in a similar range.
  • Read the AIXpert Blog article for all the details nmon and External Data Collectors
December 05, 2016, at 02:02 PM by 127.0.0.1 -
Changed lines 19-75 from:
  • Question 1: Which nmon for my version of AIX or Linux?
  • Question 2: nmon crash shortly after starting a data capture please send me the next version?
  • Question 3: Significant nmon dates?
  • Question 4: All I get is "nmon not found"?
  • Question 5: Can you add the monitoring tape drive on AIX?
  • Question 6: What is the most reported error for nmon?
  • Question 7: Can I decide the filename it saves data too?
  • Question 8: What is the default output filename?
  • Question 9: I want nmon output piped into a further command, how?
  • Question 10: Why do you support all these old unsupported AIX versions?
  • Question 11: What if I want support?
  • Question 12: Why don't you add a Java front end to nmon and get graphical output?
  • Question 13: The command line options don't seem to work right for file capture?
  • Question 14: What is paging to a filesystem?
  • Question 15: Where can I get nmon and further information?
  • Question 16: TOP process stats get switched on when I request Asynchronous I/O stats?
  • Question 17: nmon2rrd fails, please fix it?
  • Question 18: NANQ and INF?
  • Question 19: nmon reports more than 100% for a process - clearly it is wrong?
  • Question 20: On AIX the disk adapter are wrong?
  • Question 21: on AIX the adapter busy goes over 100%. That is impossible surely?
  • Question 22: What about nmon for HP/UX, Solaris on Sparc or x86 or Linux on Itanium?
  • Question 23: What about nmon for Windows?
  • Question 24: Seeing double the number of CPUs?
  • Question 25: Hello, I am new to UNIX and want to tune AIX, what do you recommend?
  • Question 26: CPU wait is too high, how can I reduce it?
  • Question 27: On AIX, free memory is near zero, how do I free more memory?
  • Question 28: How can I set numperm better?
  • Question 29: What format is the nmon output file?
  • Question 30: I have collected once a second for 8 hours but I can't get the Analyser to work?
  • Question 31: nmon does not work on my Linux machine!!
  • Question 32: When do we get nmon 10 for Linux?
  • Question 33: The boxes and lines in nmon do not work right online with: DTterm, xterm, rvxt, putty, VNC, (whatever you have)?
  • Question 34: I have 2400 disk (small SAN LUNs) and nmon is slow to collect the stats from so many, can you help?
  • Question 35: What is CharIO (a column of the TOP processes stats)?
  • Question 36: On Linux the disk stats are all doubled?
  • Question 37: On AIX the disk seem to be mostly on the first adapter?
  • Question 38: On nmon for Linux the CPU Wait for IO number is zero or odd?
  • Question 39: On nmon for Linux the paging details are missing and the PAGE lines for the capture to file are missing.
  • Question 40: I want to collect data every second and then see weekly and monthly reports. How?
  • Question 41: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?
  • Question 42: The Disk Busy stats are missing on AIX
  • Question 43: Sort order problems with massive nmon output files.
  • Question 44: Does nmon capture point in time stats or averages?
  • Question 45: When will nmon collect data from lots of machines or LPARs?
  • Question 46: When will nmon collect data like "topas -C"?
  • Question 47: nmon will no stay running - What should I check?
  • Question 48: Why isn't nmon for Linux on the Distro media or online repository or it is but out of date?
  • Question 49: Do you have nmon presentations I could use for training others?
to:
  1. Which nmon for my version of AIX or Linux?
  2. nmon crash shortly after starting a data capture please send me the next version?
  3. Significant nmon dates?
  4. All I get is "nmon not found"?
  5. Can you add the monitoring tape drive on AIX?
  6. What is the most reported error for nmon?
  7. Can I decide the filename it saves data too?
  8. What is the default output filename?
  9. I want nmon output piped into a further command, how?
  10. Question 10: Why do you support all these old unsupported AIX versions?
  11. What if I want support?
  12. Why don't you add a Java front end to nmon and get graphical output?
  13. The command line options don't seem to work right for file capture?
  14. What is paging to a filesystem?
  15. Where can I get nmon and further information?
  16. TOP process stats get switched on when I request Asynchronous I/O stats?
  17. nmon2rrd fails, please fix it?
  18. NANQ and INF?
  19. nmon reports more than 100% for a process - clearly it is wrong?
  20. On AIX the disk adapter are wrong?
  21. on AIX the adapter busy goes over 100%. That is impossible surely?
  22. What about nmon for HP/UX, Solaris on Sparc or x86 or Linux on Itanium?
  23. What about nmon for Windows?
  24. Seeing double the number of CPUs?
  25. Hello, I am new to UNIX and want to tune AIX, what do you recommend?
  26. CPU wait is too high, how can I reduce it?
  27. On AIX, free memory is near zero, how do I free more memory?
  28. How can I set numperm better?
  29. What format is the nmon output file?
  30. I have collected once a second for 8 hours but I can't get the Analyser to work?
  31. nmon does not work on my Linux machine!!
  32. When do we get nmon 10 for Linux?
  33. The boxes and lines in nmon do not work right online with: DTterm, xterm, rvxt, putty, VNC, (whatever you have)?
  34. I have 2400 disk (small SAN LUNs) and nmon is slow to collect the stats from so many, can you help?
  35. What is CharIO (a column of the TOP processes stats)?
  36. On Linux the disk stats are all doubled?
  37. On AIX the disk seem to be mostly on the first adapter?
  38. On nmon for Linux the CPU Wait for IO number is zero or odd?
  39. On nmon for Linux the paging details are missing and the PAGE lines for the capture to file are missing.
  40. I want to collect data every second and then see weekly and monthly reports. How?
  41. How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?
  42. The Disk Busy stats are missing on AIX
  43. Sort order problems with massive nmon output files.
  44. Does nmon capture point in time stats or averages?
  45. When will nmon collect data from lots of machines or LPARs?
  46. When will nmon collect data like "topas -C"?
  47. nmon will no stay running - What should I check?
  48. Why isn't nmon for Linux on the Distro media or online repository or it is but out of date?
  49. Do you have nmon presentations I could use for training others?
  50. nmon Analyser: What is Wavg?
  51. LPAR Tab/Statistics missing with Dedicated CPU mode?
Changed lines 683-684 from:
to:

Question 50: nmon Analyser: What is Wavg?

  • Wavg or WAVG is the Weight Average.
  • This data is not in the nmon output file but calculated in the Analyser.
  • This is the average of the busy periods and largely ignores the idle times.
  • If you take a 100 snapshots of a static with 50 busy at 100% and 50 with idle at 0% then the average is 50% but that number does not really describe the situation and can be misleading.
  • The Analyser uses a mathematical trick to boost the importance of the busy times and discount the idle periods.
  • Take a look at the Analyser spreadsheet for the calculation.

Question 51: LPAR Tab/Statistics missing with Dedicated CPU mode?

  • The LPAR Tab or LPAR Statistics lines in the nmon file are all about Shared CPU usage.
  • For a LPAR in Dedicated CPU mode these stats don't make any sense, so they are not collected.
  • This is NOT a bug.
December 05, 2016, at 12:33 PM by 127.0.0.1 -
Changed lines 73-74 from:
to:
  • Question 49: Do you have nmon presentations I could use for training others?
Added lines 672-682:

Question 49: Do you have nmon presentations I could use for training others?

  • We are not in the 1990 any more! The worlds has moved on.
  • The new way is watching YouTube videos to learn at your own speed and at a convenient time.
  • See the nmon Documentation for the list of nmon for Linux and nmon for AIX videos by me.

Roughly 45 minutes plus either of the two popular graphing tools: nmonchart (browser graphs) or nmon analyser (Excel) - which both work doe Linux and AIX files.

December 05, 2016, at 12:19 PM by 127.0.0.1 -
Changed lines 72-74 from:
to:
  • Question 48: Why isn't nmon for Linux on the Distro media or online repository or it is but out of date?
Changed lines 663-664 from:
to:

Question 48: Why isn't nmon for Linux on the Distro media or online repository or it is but out of date?

  • Good question but there are size limits to the typical Distribution 4 GB Media DVD.
  • Things are improving: for Ubuntu, Red Hat and SUSE - my focus for enterprise Linux.
  • But each has bizarre processes to get packages accepted and updated.
  • For POWER systems the IBM Internal repositories have current nmon for Linux for current and new SUSE and Red Hat releases.
  • For Ubuntu the person who added the original nmon for Linux 14g package - fell asleep for two years!! Only recently (Sept 2016) updating the package that might appear in Ubuntu in 2017.
  • If you as Linux user request this to your Distributor it might improve the situation.
Added line 702:
  • Always available in nmon for Linux
December 05, 2016, at 11:43 AM by 127.0.0.1 -
Changed lines 151-154 from:
  • The answer is "Yes you are correct". They are very different and it not me forgetting to implement some of the stats.
to:
  • The answer is "Yes you are correct". Some of the basics map OK like memory total size and memory free but the bulk are very different.
  • Also note that early Linux on Intel/AMD had to cope with small memory size with high and low memory areas. This has died out now with the move to 64 bit memory addressing.
  • They are very different and it not me forgetting to implement some of the stats.
  • For example the AIX NEWMEM starts are not available under Linux and never will be.
December 05, 2016, at 11:39 AM by 127.0.0.1 -
Changed line 4 from:

This is a work in progress - November 2016

to:

This is a work in progress and may never be finished - Last update Dec 2016.

December 05, 2016, at 11:38 AM by 127.0.0.1 -
Changed lines 149-151 from:
  • Sorry but I can't write nmon and teach the world the basics on UNIX/Linuxperformance tuning.
to:
  • Sorry but I can't write nmon and teach the world the basics on UNIX/Linux performance tuning.
  1. The AIX and Linux memory stats are different?
    • The answer is "Yes you are correct". They are very different and it not me forgetting to implement some of the stats.
December 05, 2016, at 11:36 AM by 127.0.0.1 -
Changed lines 17-18 from:

Summary of the questions:

to:

Summary of the questions

Changed lines 137-141 from:
  1. nmon crashes have it starts in collecting to a file mode. See question 2.
  2. nmon Analyser does not work - because the nmon file is empty or incomplete
  3. Can we have a new feature XYZ - and its all ready implemented so read the nmon -h output
  4. I have a problem with the nmon options - turns out they can't read nmon -h which stats -f or -F MUST be the first option on the line
  5. How do I interpret nmon output - first do your how work by learn UNIX and Linux performance statistics: read the command manuals, take a course spend 5 years benchmarking
to:
  1. nmon crashes have it starts in collecting to a file mode.
    • See question 2.
  2. nmon Analyser does not work
    • Quite often the nmon output file is empty due to not waiting long enough - if you request data every 5 minutes with 16 minutes before you try analysing the file!
    • Incomplete if nmon is still running andoutputing data you can grab the file and not have a complete resord right at the end - you could edit with vi to remove the last set of output - see the lines starting ZZZ
  3. Can we have a new feature XYZ?
    • But it turns out its all ready implemented (and has been for a few years)
    • So read the nmon -h output and you might find it
  4. I have a problem with the nmon options
    • Turns out they can't read nmon -h output which states: The -f or -F MUST be the first option on the line
  5. How do I interpret nmon output?
    • First do your home work by learn UNIX and Linux performance statistics: read the command manuals, take a course or spend 5 years benchmarking
    • Sorry but I can't write nmon and teach the world the basics on UNIX/Linuxperformance tuning.
December 05, 2016, at 10:27 AM by 127.0.0.1 -
Changed lines 586-587 from:


to:
Changed lines 625-626 from:
  • You would be amazed how often I get questions about 2 to 40 years old version of nmon for Linux
  • There seems to be a meantalliy with some that if nmon works now then you can pass that version on yo your grand children!
to:
  • You would be amazed how often I get questions about 2 to 5 year old version of nmon for Linux
  • There seems to be a mentality with some users, that if nmon works now then you can pass that version on yo your grand children!
Changed lines 640-641 from:

Still got problems

  • AIX: Raise a PMR with IBM Support - assuming you have paid for AIX Support.
to:

Still got a problem? Get some help

  • First create a simple nmon file with the problem for investigation then ...
  • AIX: Raise a PMR with IBM Support - assuming you have paid for AIX Support!
Added lines 646-647:
December 05, 2016, at 10:20 AM by 127.0.0.1 -
Changed lines 619-626 from:
  • First the regular house keeping:
    • Have you filled up a filesystem? df -m
    • Have you got a recent Operating System level? i.e. missing bug fixes
      • Linux cat /etc/*ease
      • AIX oslevel -s if the last four digits don't start with a 15 or 16 then your AIX is probably not fully supported !!!
    • On Linux have you got a current version on nmon?
      • You would be amazed how often I get questions about 2 to 40 years old version of nmon for Linux
      • There seems to be a meantalliy with some that if nmon works now then you can pass that version on yo your grand children!
to:

First the regular house keeping:

  • Have you filled up a filesystem? df -m
  • Have you got a recent Operating System level? i.e. missing bug fixes
    • Linux cat /etc/*ease
    • AIX oslevel -s if the last four digits don't start with a 15 or 16 then your AIX is probably not fully supported !!!
  • On Linux have you got a current version on nmon?
    • You would be amazed how often I get questions about 2 to 40 years old version of nmon for Linux
    • There seems to be a meantalliy with some that if nmon works now then you can pass that version on yo your grand children!
December 05, 2016, at 10:19 AM by 127.0.0.1 -
Deleted line 39:
Added line 41:
Deleted line 46:
Added line 52:
Deleted line 57:
Added line 63:
Deleted line 64:
Deleted lines 67-68:
Deleted lines 68-69:
Changed lines 71-72 from:
to:
  • Question 47: nmon will no stay running - What should I check?
Added lines 617-644:

Question 47: nmon will no stay running - What should I check?

  • First the regular house keeping:
    • Have you filled up a filesystem? df -m
    • Have you got a recent Operating System level? i.e. missing bug fixes
      • Linux cat /etc/*ease
      • AIX oslevel -s if the last four digits don't start with a 15 or 16 then your AIX is probably not fully supported !!!
    • On Linux have you got a current version on nmon?
      • You would be amazed how often I get questions about 2 to 40 years old version of nmon for Linux
      • There seems to be a meantalliy with some that if nmon works now then you can pass that version on yo your grand children!
  • What do you get in the nmon output file?
    • That can provide clues about where it stopped.
    • Use vi to take a look at the end of the file.
  • Have you forgotten to reboot AIX after an upgrade?
    • The result is /unix not matching what is running content of memory
  • Stop using duff nmon command line options.
    • Is it possible for you to read the nmon -h output :-)
    • No one seems to read the line Note: use only one of f,F,z,x or X and make it the first argument
  • Are you sure you really want all these two dozen options.
    • Some times invalid options causes problems like, just as an example, requesting NFS stats when the OS is not using NFS.
  • KISS = What happens if you try something simple like: nmon -f -s1 -c 10
    • This eliminates the advanced options/stats being the cause of an issue.

Still got problems

  • AIX: Raise a PMR with IBM Support - assuming you have paid for AIX Support.
  • Linux: Contant me, raise a question on the Forum or you can raise a bug on the Sourceforge project.
    • DeveloperWorks: Performance Tools Forum
    • nmon for Linux SourceForge bugs
December 01, 2016, at 10:44 AM by 127.0.0.1 -
Changed line 116 from:

Image

to:

http:/docs/nmon_timeline.png

December 01, 2016, at 10:41 AM by 127.0.0.1 -
Added line 116:

Image

December 01, 2016, at 10:36 AM by 127.0.0.1 -
Deleted line 11:
Changed lines 13-16 from:
  • Answers in RED are for very old versions of nmon for AIX

Summary of the questions:

to:
  • Answers in RED are for very old versions of nmon for AIX - At the bottom.

Summary of the questions:

December 01, 2016, at 10:35 AM by 127.0.0.1 -
Changed lines 439-440 from:

Question 34: What is CharIO (a column of the TOP processes stats)?

to:

Question 35: What is CharIO (a column of the TOP processes stats)?

Changed lines 447-448 from:

Question 35: On Linux the disk stats are all doubled?

to:

Question 36: On Linux the disk stats are all doubled?

Changed lines 459-460 from:

Question 36: On AIX the disk seem to be mostly on the first adapter?

to:

Question 37: On AIX the disk seem to be mostly on the first adapter?

Changed lines 472-473 from:

Question 37: On nmon for Linux the CPU Wait for IO number is zero or odd?

to:

Question 38: On nmon for Linux the CPU Wait for IO number is zero or odd?

Changed lines 477-478 from:

Question 38: nmon for Linux has paging details missing and the PAGE lines for the capture to file are missing.

to:

Question 39: nmon for Linux has paging details missing and the PAGE lines for the capture to file are missing.

Changed lines 482-483 from:

Question 39: I want to collect data every second and then see weekly and monthly reports. How?

to:

Question 40: I want to collect data every second and then see weekly and monthly reports. How?

Changed lines 514-515 from:

Question 40: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?

to:

Question 41: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?

Changed lines 540-541 from:

Question 41: The Disk Busy stats are missing on AIX, what do I do?

to:

Question 42: The Disk Busy stats are missing on AIX, what do I do?

Changed lines 547-548 from:

Question 42: Sort order problems with massive nmon output files?

to:

Question 43: Sort order problems with massive nmon output files?

Changed lines 568-569 from:

Question 43: Does nmon capture point in time stats or averages?

to:

Question 44: Does nmon capture point in time stats or averages?

Changed lines 591-592 from:

Question 44: When will nmon collect data from lots of machines or LPARs?

to:

Question 45: When will nmon collect data from lots of machines or LPARs?

Changed line 613 from:

Question 45: When will nmon collect data like the AIX "topas -C"?

to:

Question 46: When will nmon collect data like the AIX "topas -C"?

December 01, 2016, at 10:33 AM by 127.0.0.1 -
Changed lines 59-76 from:
  • Question 44: What is CharIO (a column of the TOP processes stats)?
  • Question 45: On Linux the disk stats are all doubled?
  • Question 46: On AIX the disk seem to be mostly on the first adapter?
  • Question 47: On nmon for Linux the CPU Wait for IO number is zero or odd?
  • Question 48: On nmon for Linux the paging details are missing and the PAGE lines for the capture to file are missing.
  • Question 49: I want to collect data every second and then see weekly and monthly reports. How?
  • Question 51: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?
  • Question 52: The Disk Busy stats are missing on AIX
  • Question 53: Sort order problems with massive nmon output files.
  • Question 56: Does nmon capture point in time stats or averages?
  • Question 100: When will nmon collect data from lots of machines or LPARs?
  • Question 101: When will nmon collect data like "topas -C"?
to:
  • Question 35: What is CharIO (a column of the TOP processes stats)?
  • Question 36: On Linux the disk stats are all doubled?
  • Question 37: On AIX the disk seem to be mostly on the first adapter?
  • Question 38: On nmon for Linux the CPU Wait for IO number is zero or odd?
  • Question 39: On nmon for Linux the paging details are missing and the PAGE lines for the capture to file are missing.
  • Question 40: I want to collect data every second and then see weekly and monthly reports. How?
  • Question 41: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?
  • Question 42: The Disk Busy stats are missing on AIX
  • Question 43: Sort order problems with massive nmon output files.
  • Question 44: Does nmon capture point in time stats or averages?
  • Question 45: When will nmon collect data from lots of machines or LPARs?
  • Question 46: When will nmon collect data like "topas -C"?
Changed lines 439-440 from:

Question 44: What is CharIO (a column of the TOP processes stats)?

to:

Question 34: What is CharIO (a column of the TOP processes stats)?

Changed lines 447-448 from:

Question 45: On Linux the disk stats are all doubled?

to:

Question 35: On Linux the disk stats are all doubled?

Changed lines 459-460 from:

Question 46: On AIX the disk seem to be mostly on the first adapter?

to:

Question 36: On AIX the disk seem to be mostly on the first adapter?

Changed lines 472-473 from:

Question 47: On nmon for Linux the CPU Wait for IO number is zero or odd?

to:

Question 37: On nmon for Linux the CPU Wait for IO number is zero or odd?

Changed lines 477-478 from:

Question 48: nmon for Linux has paging details missing and the PAGE lines for the capture to file are missing.

to:

Question 38: nmon for Linux has paging details missing and the PAGE lines for the capture to file are missing.

Changed lines 482-483 from:

Question 49: I want to collect data every second and then see weekly and monthly reports. How?

to:

Question 39: I want to collect data every second and then see weekly and monthly reports. How?

Changed lines 514-515 from:

Question 51: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?

to:

Question 40: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?

Changed lines 540-541 from:

Question 52: The Disk Busy stats are missing on AIX, what do I do?

to:

Question 41: The Disk Busy stats are missing on AIX, what do I do?

Changed lines 547-548 from:

Question 53: Sort order problems with massive nmon output files?

to:

Question 42: Sort order problems with massive nmon output files?

Changed lines 568-569 from:

Question 56: Does nmon capture point in time stats or averages?

to:

Question 43: Does nmon capture point in time stats or averages?

Changed lines 591-592 from:

Question 100: When will nmon collect data from lots of machines or LPARs?

to:

Question 44: When will nmon collect data from lots of machines or LPARs?

Changed line 613 from:

Question 101: When will nmon collect data like the AIX "topas -C"?

to:

Question 45: When will nmon collect data like the AIX "topas -C"?

December 01, 2016, at 10:30 AM by 127.0.0.1 -
Deleted line 26:
Changed lines 37-58 from:
  • Question 21: TOP process stats get switched on when I request Asynchronous I/O stats?
  • Question 23: nmon2rrd fails, please fix it?
  • Question 24: NANQ and INF?
  • Question 26: nmon reports more than 100% for a process - clearly it is wrong?
  • Question 27: On AIX the disk adapter are wrong?
  • Question 28: on AIX the adapter busy goes over 100%. That is impossible surely?
  • Question 29: What about nmon for HP/UX, Solaris on Sparc or x86 or Linux on Itanium?
  • Question 30: What about nmon for Windows?
  • Question 31: Seeing double the number of CPUs?
  • Question 33: Hello, I am new to UNIX and want to tune AIX, what do you recommend?
  • Question 34: CPU wait is too high, how can I reduce it?
  • Question 35: On AIX, free memory is near zero, how do I free more memory?
  • Question 36: How can I set numperm better?
  • Question 37: What format is the nmon output file?
  • Question 38: I have collected once a second for 8 hours but I can't get the Analyser to work?
  • Question 39: nmon does not work on my Linux machine!!
  • Question 40: When do we get nmon 10 for Linux?
  • Question 41: The boxes and lines in nmon do not work right online with: DTterm, xterm, rvxt, putty, VNC, (whatever you have)?
  • Question 42: I have 2400 disk (small SAN LUNs) and nmon is slow to collect the stats from so many, can you help?
to:
  • Question 16: TOP process stats get switched on when I request Asynchronous I/O stats?
  • Question 17: nmon2rrd fails, please fix it?
  • Question 18: NANQ and INF?
  • Question 19: nmon reports more than 100% for a process - clearly it is wrong?
  • Question 20: On AIX the disk adapter are wrong?
  • Question 21: on AIX the adapter busy goes over 100%. That is impossible surely?
  • Question 22: What about nmon for HP/UX, Solaris on Sparc or x86 or Linux on Itanium?
  • Question 23: What about nmon for Windows?
  • Question 24: Seeing double the number of CPUs?
  • Question 25: Hello, I am new to UNIX and want to tune AIX, what do you recommend?
  • Question 26: CPU wait is too high, how can I reduce it?
  • Question 27: On AIX, free memory is near zero, how do I free more memory?
  • Question 28: How can I set numperm better?
  • Question 29: What format is the nmon output file?
  • Question 30: I have collected once a second for 8 hours but I can't get the Analyser to work?
  • Question 31: nmon does not work on my Linux machine!!
  • Question 32: When do we get nmon 10 for Linux?
  • Question 33: The boxes and lines in nmon do not work right online with: DTterm, xterm, rvxt, putty, VNC, (whatever you have)?
  • Question 34: I have 2400 disk (small SAN LUNs) and nmon is slow to collect the stats from so many, can you help?
Changed lines 232-233 from:

Question 21: TOP process stats get switched on when I request AIX Asynchronous I/O stats?

to:

Question 16: TOP process stats get switched on when I request AIX Asynchronous I/O stats?

Changed lines 238-239 from:

Question 23: nmon2rrd fails, please fix it?

to:

Question 17: nmon2rrd fails, please fix it?

Changed lines 246-247 from:

Question 24: What are NANQ and INF?

to:

Question 18: What are NANQ and INF?

Changed lines 254-255 from:

Question 26: nmon reports more than 100% for a process - clearly it is wrong?

to:

Question 19: nmon reports more than 100% for a process - clearly it is wrong?

Changed lines 262-263 from:

Question 27: On AIX the disk adapters are wrong?

to:

Question 20: On AIX the disk adapters are wrong?

Changed lines 273-274 from:

Question 28: On AIX the adapter busy goes over 100%. That is impossible surely?

to:

Question 21: On AIX the adapter busy goes over 100%. That is impossible surely?

Changed lines 282-283 from:

Question 29: What about nmon for HP/UX, Solaris on Sparc or x86 or Linux on Itanium?

to:

Question 22: What about nmon for HP/UX, Solaris on Sparc or x86 or Linux on Itanium?

Changed lines 289-290 from:

Question 30: What about nmon for Windows?

to:

Question 23: What about nmon for Windows?

Changed lines 298-299 from:

Question 31: Seeing double the number of CPUs on my POWER server?

to:

Question 24: Seeing double the number of CPUs on my POWER server?

Changed lines 307-308 from:

Question 33: Hello, I am new to UNIX and want to tune AIX, what do you recommend?

to:

Question 25: Hello, I am new to UNIX and want to tune AIX, what do you recommend?

Changed lines 314-315 from:

Question 34: CPU wait is too high, how can I reduce it?

to:

Question 26: CPU wait is too high, how can I reduce it?

Changed lines 330-331 from:

Question 35: On AIX, free memory is near zero, how do I free more memory?

to:

Question 27: On AIX, free memory is near zero, how do I free more memory?

Changed lines 337-338 from:

Question 36: How can I set numperm better?

to:

Question 28: How can I set numperm better?

Changed lines 343-344 from:

Question 37: What format is the nmon output file?

to:

Question 29: What format is the nmon output file?

Changed lines 359-360 from:

Question 38: I have collected once a second for 8 hours but I can't get the Analyser to work?

to:

Question 30: I have collected once a second for 8 hours but I can't get the Analyser to work?

Changed lines 375-376 from:

Question 39: nmon does not work on my Linux machine!

to:

Question 31: nmon does not work on my Linux machine!

Changed lines 382-383 from:

Question 40: When do we get nmon for AIX version X for Linux?

to:

Question 32: When do we get nmon for AIX version X for Linux?

Changed lines 391-392 from:

Question 41: The boxes and lines in nmon do not work right online with: DTterm, xterm, rvxt, putty, VNC, (whatever you have)?

to:

Question 33: The boxes and lines in nmon do not work right online with: DTterm, xterm, rvxt, putty, VNC, (whatever you have)?

Changed line 425 from:

Question 42: I have 2400 disks (or 2400 small SAN LUNs) and nmon is slow to collect the stats from so many, can you help?

to:

Question 34: I have 2400 disks (or 2400 small SAN LUNs) and nmon is slow to collect the stats from so many, can you help?

December 01, 2016, at 10:23 AM by 127.0.0.1 -
Changed lines 627-632 from:
  • Hard luck
  • I will actively help get AIX 5 bugs fixed but older versions are very much less interesting.
  • In particular, on AIX 4.1.5 the TOP processes does not work but I am not going to fix it unless some one offers me hard currency
to:
  • Hard luck
  • I will actively help get AIX 5 bugs fixed but older versions are very much less interesting.
  • In particular, on AIX 4.1.5 the TOP processes does not work but I am not going to fix it unless some one offers me hard currency
Changed lines 634-638 from:
  • AIX
    • Not in AIX 4 - there are no adapter stats in this AIX.
    • This is now available in AIX 5 and higher via the libperfstat library so programmers can get this information - but a warning this is derived data from the connected disks (NOT tape drives) because there is no adapter stats. XXX
to:
  • AIX
    • Not in AIX 4 - there are no adapter stats in this AIX.
    • This is now available in AIX 5 and higher via the libperfstat library so programmers can get this information - but a warning this is derived data from the connected disks (NOT tape drives) because there is no adapter stats. XXX
Changed lines 641-645 from:
  • The error is something about "lslpp" AIX 5.1 about ML03 onwards - or - WLM stats go missing - after upgrading to AIX 5.2 ML5 - can you fix nmon?
  • These are bugs in AIX and not nmon -there are fixes available.
  • Please report these problems to your AIX support channel and not me. nmon 10 has also been back ported to AIX 5.1 and AIX 5.2 and has code to work around these bugs and can be used instead of nmon9a.
to:
  • The error is something about "lslpp" AIX 5.1 about ML03 onwards - or - WLM stats go missing - after upgrading to AIX 5.2 ML5 - can you fix nmon?
  • These are bugs in AIX and not nmon -there are fixes available.
  • Please report these problems to your AIX support channel and not me. nmon 10 has also been back ported to AIX 5.1 and AIX 5.2 and has code to work around these bugs and can be used instead of nmon9a.
Changed lines 648-650 from:
  • Available from the AIX 5.1 onwards
to:
  • Available from the AIX 5.1 onwards
Changed lines 653-656 from:
  • With reports like:
  • read error: No such device or address
  • nmon file=nmon.c line=1278 version=XXX
  • In 95% of the time it is because AIX was upgraded or a maintenance level added but the AIX/system was not rebooted. It is very easy to miss the "You must reboot" message in the gallons of installp output. The reboot is required because the AIX kernel image has been updated and the reboot is the only way to activate the new /unix file. nmon reads the /unix file to find kernel data structure addresses but if the /unix file does no match what is actually running, you get this message.
to:
  • With reports like:
  • read error: No such device or address
  • nmon file=nmon.c line=1278 version=XXX
  • In 95% of the time it is because AIX was upgraded or a maintenance level added but the AIX/system was not rebooted. It is very easy to miss the "You must reboot" message in the gallons of installp output. The reboot is required because the AIX kernel image has been updated and the reboot is the only way to activate the new /unix file. nmon reads the /unix file to find kernel data structure addresses but if the /unix file does no match what is actually running, you get this message.
Changed lines 671-674 from:
  • AIX
    • Correct, this data is not available on AIX 5.1 from the libperfstat library.
    • This also causes a problem on nmon2rrd version 10 where it expects the IOADAPT section and crashes.
    • Recommended action upgrade AIX as 5.1 is not supported without purchasing extended support.
to:
  • AIX
    • Correct, this data is not available on AIX 5.1 from the libperfstat library.
    • This also causes a problem on nmon2rrd version 10 where it expects the IOADAPT section and crashes.
    • Recommended action upgrade AIX as 5.1 is not supported without purchasing extended support.
December 01, 2016, at 10:20 AM by 127.0.0.1 -
Changed lines 100-102 from:
  • 1997 - nmon for AIX (AIX version 3.1.5). Developed to save Nigel Griffiths effort in monitoring benchmarks and creating Benchmark report graphs.
    • Remember back then these benchmarks only used dumb terminals with 80x25 character screen. Graphics were available but took to much CPU time.
    • Also Microsoft Excel not invented so we used Lotus 1-2-3 with very limited CSV file size limits.
to:
  • 1997 - nmon for AIX (AIX version 3.1.5). Developed to save Nigel Griffiths effort in monitoring benchmarks and creating benchmark report graphs.
    • Remember back then these benchmarks only used dumb terminals with 80x25 character screens (hence dense on-screen stats). Graphics adapters were available but took to much CPU time to use in benmarks.
    • Also Microsoft Excel not invented so we used Lotus 1-2-3 with very limited CSV file size limits (hence dense file stats with low duplication).
December 01, 2016, at 10:18 AM by 127.0.0.1 -
Changed line 100 from:
  • 1997 - nmon for AIX developer to save Nigel Griffiths effort in monitoring benchmarks and creating Benchmark report graphs
to:
  • 1997 - nmon for AIX (AIX version 3.1.5). Developed to save Nigel Griffiths effort in monitoring benchmarks and creating Benchmark report graphs.
Changed lines 102-112 from:
  • Also Microsoft Excel not invented so we used Lotus 1-2-3 with very limited CSV file size limits
  • 1998 - nmon for AIX (AIX 3+) only available on floppy disk to IBM Benchmark team in the UK and IBM Montpelier
  • 2001 - nmon for AIX 5.5 binaries first released internally in IBM on a Webserver
  • 2003
  • 21st Nov 2008 - nmon made part of AIX
    • In the following year nmon appears on every subsequent release as a default installed command
  • 27th July 2009 - nmon for Linux released to open source
    • nmon for Linux released to open source under GPL - it was an internal project at IBM for many years.
    • The source code and further information is available at Sourceforge and in particular at the new nmon for Linux wiki at http://nmon.sourceforge.net
    • This means that you can compile nmon for Linux for your specific Linux flavour and help improve it further.
to:
  • Also Microsoft Excel not invented so we used Lotus 1-2-3 with very limited CSV file size limits.
  • 1998 - nmon for AIX (AIX 3+) only available on floppy disk to IBM Benchmark team in the UK and IBM Montpelier.
  • 2001 - nmon for AIX 5.5 binaries first released internally in IBM on a Webserver.
  • 2003 nmon for Linux code started distributed only as binaries
  • 21st Nov 2008 - nmon made part of AIX
    • In the following year nmon appears on every subsequent release as a default installed command.
    • This means nmon for AIX gets full IBM Problem Management Report (PMR) support like any other command.
    • Note graphing tools like nmon Analyser and nmonchart are not part of AIX and not part of AIX Support. These are still handled by willing IBMers and largely in their own time.
  • Ongoing nmon for AIX development by the IBM AIX development team - specially the performance tools team.
  • 27th July 2009 - nmon for Linux released to open source
    • nmon for Linux released to open source under GPL - it was an internal project at IBM for many years.
    • The source code and further information is available at Sourceforge and in particular at the new nmon for Linux wiki at http://nmon.sourceforge.net
    • This means that you can compile nmon for Linux for your specific Linux flavour and help improve it further.
  • Ongoing nmon for Linux development by Nigel Griffiths
December 01, 2016, at 10:04 AM by 127.0.0.1 -
Changed lines 623-624 from:

OLD Question 1: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?

to:

OLD Question 1: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?

Changed lines 630-631 from:

OLD Question 2: Can I get the adapters stats from other tools?

to:

OLD Question 2: Can I get the adapters stats from other tools?

Changed lines 637-638 from:

OLD Question 3: When I start nmon 9 on a system that it use to run fine I know get an error message?

to:

OLD Question 3: When I start nmon 9 on a system that it use to run fine I know get an error message?

Changed lines 644-645 from:

OLD Question 4: Can you add the monitoring of process priority?

to:

OLD Question 4: Can you add the monitoring of process priority?

Changed lines 649-650 from:

OLD Question 5: nmon on AIX, nmon 9 does not run, please fix?

to:

OLD Question 5: nmon on AIX, nmon 9 does not run, please fix?

Changed lines 658-663 from:

OLD Question 6: Very old question about nmon 10 and WPAR stats removed

  • Removed

OLD Question 7: Old nmon version question: nmon and AIX commands do not agree?

to:

OLD Question 6: Old nmon version question: nmon and AIX commands do not agree?

Changed lines 667-672 from:

OLD Question 8: Very old nmon version for AIX: question about NFS driver failures removed

  • Removed

OLD Question 9: Old nmon for AIX: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?

to:

OLD Question 7: Old nmon for AIX: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?

Changed lines 675-676 from:

OLD Question 10: nmon for AIX will not start on AIX 5.1 due to a libperfstat error?

to:

OLD Question 8: nmon for AIX will not start on AIX 5.1 due to a libperfstat error?

Changed line 688 from:

OLD Question 11: Old nmon version: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"

to:

OLD Question 9: Old nmon version: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"

Changed lines 706-707 from:

[+OLD Question 12: Old AIX version: AIX 5.3 updated but then nmon gives "Assert Failure"+#

to:

OLD Question 10: Old AIX version: AIX 5.3 updated but then nmon gives "Assert Failure"

Changed lines 730-731 from:

OLD Question 13: Old AIX version:On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats?

to:

OLD Question 11: Old AIX version:On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats?

Changed line 738 from:

OLD Question 14: Old AIX version: Why is the Process memory percentage zero? (same for System and User percent)

to:

OLD Question 12: Old AIX version: Why is the Process memory percentage zero? (same for System and User percent)

December 01, 2016, at 10:00 AM by 127.0.0.1 -
Added line 10:
  • Answers in BLUE are for nmon for Linux
Added lines 12-13:
  • Answers in BLACK apply to both versions
Changed lines 15-18 from:
  • Answers in BLUE are for nmon for Linux
  • Answers in BLACK apply to both versions
to:
Changed lines 25-37 from:
  • Question 8: What is the most reported error for nmon?
  • Question 11: Can I decide the filename it saves data too?
  • Question 12: What is the default output filename?
  • Question 13: I want nmon output piped into a further command, how?
  • Question 14: Why do you support all these old unsupported AIX versions?
  • Question 15: What if I want support?
  • Question 16: Why don't you add a Java front end to nmon and get graphical output?
  • Question 17: The command line options don't seem to work right for file capture?
  • Question 18: What is paging to a filesystem?
  • Question 19: Where can I get nmon and further information?
to:
  • Question 6: What is the most reported error for nmon?
  • Question 7: Can I decide the filename it saves data too?
  • Question 8: What is the default output filename?
  • Question 9: I want nmon output piped into a further command, how?
  • Question 10: Why do you support all these old unsupported AIX versions?
  • Question 11: What if I want support?
  • Question 12: Why don't you add a Java front end to nmon and get graphical output?
  • Question 13: The command line options don't seem to work right for file capture?
  • Question 14: What is paging to a filesystem?
  • Question 15: Where can I get nmon and further information?
Changed lines 136-137 from:

Question 8: What is the most reported error for nmon?

to:

Question 6: What is the most reported error for nmon?

Changed lines 145-146 from:

Question 11: Can I decide the filename nmon saves data too?

to:

Question 7: Can I decide the filename nmon saves data too?

Changed lines 150-151 from:

Question 12: What is the default output filename?

to:

Question 8: What is the default output filename?

Changed lines 160-161 from:

Question 13: I want nmon output piped into a further command, how?

to:

Question 9: I want nmon output piped into a further command, how?

Changed lines 171-172 from:

Question 14: Why do you support all these old unsupported AIX versions?

to:

Question 10: Why do you support all these old unsupported AIX versions?

Changed lines 178-179 from:

Question 15: What if I want support?

to:

Question 11: What if I want support?

Changed lines 195-196 from:

Question 16: Why don't you add a Java front end to nmon and get graphical output?

to:

Question 12: Why don't you add a Java front end to nmon and get graphical output?

Changed lines 202-203 from:

Question 17: The command line options don't seem to work right for file capture?

to:

Question 13: The command line options don't seem to work right for file capture?

Changed lines 209-210 from:

Question 18: What is paging to a filesystem (rather than to paging space)?

to:

Question 14: What is paging to a filesystem (rather than to paging space)?

Changed line 217 from:

Question 19: Where can I get nmon and further information?

to:

Question 15: Where can I get nmon and further information?

December 01, 2016, at 09:58 AM by 127.0.0.1 -
Changed line 20 from:
to:
  • Question 3: Significant nmon dates?
Added lines 98-113:

Question 3: Significant nmon dates?

  • 1997 - nmon for AIX developer to save Nigel Griffiths effort in monitoring benchmarks and creating Benchmark report graphs
    • Remember back then these benchmarks only used dumb terminals with 80x25 character screen. Graphics were available but took to much CPU time.
    • Also Microsoft Excel not invented so we used Lotus 1-2-3 with very limited CSV file size limits
  • 1998 - nmon for AIX (AIX 3+) only available on floppy disk to IBM Benchmark team in the UK and IBM Montpelier
  • 2001 - nmon for AIX 5.5 binaries first released internally in IBM on a Webserver
  • 2003
  • 21st Nov 2008 - nmon made part of AIX
    • In the following year nmon appears on every subsequent release as a default installed command
  • 27th July 2009 - nmon for Linux released to open source
    • nmon for Linux released to open source under GPL - it was an internal project at IBM for many years.
    • The source code and further information is available at Sourceforge and in particular at the new nmon for Linux wiki at http://nmon.sourceforge.net
    • This means that you can compile nmon for Linux for your specific Linux flavour and help improve it further.
Changed lines 622-623 from:

Question 3: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?

to:

OLD Question 1: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?

Changed lines 629-630 from:

Question 6: Can I get the adapters stats from other tools?

to:

OLD Question 2: Can I get the adapters stats from other tools?

Changed lines 636-637 from:

Question 7: When I start nmon 9 on a system that it use to run fine I know get an error message?

to:

OLD Question 3: When I start nmon 9 on a system that it use to run fine I know get an error message?

Changed lines 643-644 from:

Question 9: Can you add the monitoring of process priority?

to:

OLD Question 4: Can you add the monitoring of process priority?

Changed lines 648-649 from:

Question 10: nmon on AIX, nmon 9 does not run, please fix?

to:

OLD Question 5: nmon on AIX, nmon 9 does not run, please fix?

Changed lines 657-658 from:

Question 20: Very old question about nmon 10 and WPAR stats removed

to:

OLD Question 6: Very old question about nmon 10 and WPAR stats removed

Changed line 662 from:

Question 25: Old nmon version question: nmon and AIX commands do not agree?

to:

OLD Question 7: Old nmon version question: nmon and AIX commands do not agree?

Changed lines 671-672 from:

Question 32: Very old nmon version for AIX: question about NFS driver failures removed

to:

OLD Question 8: Very old nmon version for AIX: question about NFS driver failures removed

Changed lines 675-676 from:

Question 43: Old nmon for AIX: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?

to:

OLD Question 9: Old nmon for AIX: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?

Changed lines 683-684 from:

Question 50: nmon for AIX will not start on AIX 5.1 due to a libperfstat error?

to:

OLD Question 10: nmon for AIX will not start on AIX 5.1 due to a libperfstat error?

Changed line 696 from:

Question 54: Old nmon version: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"

to:

OLD Question 11: Old nmon version: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"

Changed lines 714-715 from:

[+Question 54: Old AIX version: AIX 5.3 updated but then nmon gives "Assert Failure"+#

to:

[+OLD Question 12: Old AIX version: AIX 5.3 updated but then nmon gives "Assert Failure"+#

Changed lines 738-739 from:

Question 55: Old AIX version:On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats?

to:

OLD Question 13: Old AIX version:On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats?

Changed line 746 from:

Question 57: Old AIX version: Why is the Process memory percentage zero? (same for System and User percent)

to:

OLD Question 14: Old AIX version: Why is the Process memory percentage zero? (same for System and User percent)

November 22, 2016, at 05:34 AM by 127.0.0.1 -
Changed lines 81-88 from:

AIX

  • On AIX with these or later versions: AIX 5.3 TL09+ and AIX 6.1 TL02+ and AIX 7 any version You should run the nmon that comes with AIX and is installed by default.
  • It is strongly recommended if you have problems to first add all available service packs for your AIX release as this removes 99% of problems.
  • If you have earlier AIX versions then you can run nmon classic downloadable from XXX

Linux

  • On Linux go to the nmon for Linux website (http://nmon.sourceforge.net) to download nmon. It is compiled for 50 different platforms (POWER, x86, x86_64 and Mainframe ) and Linux distributions combinations.
  • If your combination is not on the list or you have a newer Linux version you can now compile it up yourself.
to:
  • AIX
    • On AIX with these or later versions: AIX 5.3 TL09+ and AIX 6.1 TL02+ and AIX 7 any version You should run the nmon that comes with AIX and is installed by default.
    • It is strongly recommended if you have problems to first add all available service packs for your AIX release as this removes 99% of problems.
    • If you have earlier AIX versions then you can run nmon classic downloadable from XXX
  • Linux
    • On Linux go to the nmon for Linux website (http://nmon.sourceforge.net) to download nmon. It is compiled for 50 different platforms (POWER, x86, x86_64 and Mainframe ) and Linux distributions combinations.
    • If your combination is not on the list or you have a newer Linux version you can now compile it up yourself.
November 22, 2016, at 05:33 AM by 127.0.0.1 -
Changed lines 100-107 from:

Linux

  • First check it is executable (this gets switched off by FTP).
  • Second, if you are the root user, you have to name the executable directly with the full path name or (if in the current working directory) ./nmon or put it into a directory in your $PATH.

AIX

  • nmon since AIX 5.3 TL09+ and AIX 6.1 TL02+ and AIX 7 any version is a default install and the starting shell script can be found in /usr/bin/nmon - it actually starts the executable called nmon_topas.
to:
  • Linux
    • First check it is executable (this gets switched off by FTP).
    • Second, if you are the root user, you have to name the executable directly with the full path name or (if in the current working directory) ./nmon or put it into a directory in your $PATH.
  • AIX
    • nmon since AIX 5.3 TL09+ and AIX 6.1 TL02+ and AIX 7 any version is a default install and the starting shell script can be found in /usr/bin/nmon - it actually starts the executable called nmon_topas.
Changed lines 110-112 from:

AIX

  • No - the data is not available. The best you can do is to watch the disks and guess what the tape is doing. The adapter statistics is only adding up the attached disks - so it does not help. You can guess at the tape drive I/O rates by looking at the disk I/O rates - after all this is where the data is coming from but it is only approximate and does not account for memory caching of data.
  • Yes - if your tape drive is Fibre Channel connected it is very common to have it connected on a different FC adaapter to allow performance settings to suit the tape drive = streams of large blocks.
to:
  • AIX
    • No - the data is not available. The best you can do is to watch the disks and guess what the tape is doing. The adapter statistics is only adding up the attached disks - so it does not help. You can guess at the tape drive I/O rates by looking at the disk I/O rates - after all this is where the data is coming from but it is only approximate and does not account for memory caching of data.
    • Yes - if your tape drive is Fibre Channel connected it is very common to have it connected on a different FC adaapter to allow performance settings to suit the tape drive = streams of large blocks.
Changed lines 115-118 from:

Linux No FC Adapter options for Linux - unless you know the /proc file to find tape stats. In which case let Nigel know ASAP.

to:
  • Linux
    • No FC Adapter options for Linux - unless you know the /proc file to find tape stats. In which case let Nigel know ASAP.
Deleted line 150:
Changed lines 168-174 from:

AIX

  • nmon for AIX is a fully supported AIX command so you can raise a IBM Problem report (PMR). However, you can't really ask for help with post-processing graphing tools that are not part of AIX.

Linux

  • nmon for Linux is becoming part of the popular distribution - if you have paid for support you could request help
  • You can raise bugs on the sourceforge.net website for the nmon project: https://sourceforge.net/projects/nmon/
to:
  • AIX
    • nmon for AIX is a fully supported AIX command so you can raise a IBM Problem report (PMR). However, you can't really ask for help with post-processing graphing tools that are not part of AIX.
  • Linux
    • nmon for Linux is becoming part of the popular distribution - if you have paid for support you could request help
    • You can raise bugs on the sourceforge.net website for the nmon project: https://sourceforge.net/projects/nmon/
Changed line 187 from:
  • The -f, -F, -x, -X or -z MUST be the first option on the line and only one of them.
to:
  • The -f, -F, -x, -X or -z MUST be the first option on the line and only one of them.
Changed lines 245-250 from:

AIX

  • nmon just outputs what it gets from the libperfstat library.
  • For multipath I/O it is often the disk to adapter mapping reflects the order of disk discovery rather than some balanced view.
  • This is an AIX problem and not nmon's fault.
  • To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 binaries from the Roll Your Own Wiki page at ryo - and the adapt sample program: https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/Power%20Systems/page/Roll-Your-Own-Performance-Tool
to:
  • AIX
    • nmon just outputs what it gets from the libperfstat library.
    • For multipath I/O it is often the disk to adapter mapping reflects the order of disk discovery rather than some balanced view.
    • This is an AIX problem and not nmon's fault.
    • To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 binaries from the Roll Your Own Wiki page at ryo - and the adapt sample program: https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/Power%20Systems/page/Roll-Your-Own-Performance-Tool
Changed lines 313-317 from:

AIX

  • This is just how AIX works and is perfectly normal. All of memory will be soaked up with copies of filesystem blocks after a reasonable length of time and the free memory will be near zero. AIX will then use the lrud process to keep the free list at a reasonable level.
  • If you see the lrud process taking more than 30% of a CPU then you need to investigate and make memory parameter changes.
to:
  • AIX
    • This is just how AIX works and is perfectly normal. All of memory will be soaked up with copies of filesystem blocks after a reasonable length of time and the free memory will be near zero. AIX will then use the lrud process to keep the free list at a reasonable level.
    • If you see the lrud process taking more than 30% of a CPU then you need to investigate and make memory parameter changes.
Added line 426:
Changed lines 429-438 from:

Linux

  • nmon collects the data from /proc and displays it.
  • On newer Linux Kernels this is the /proc/diskstats file.
  • It was decided a long time ago that hiding data was a very bad idea as it can go wrong and then be very misleading
    • This is how the ozone hole was missed for 5 years and not detected - the algorithm decided the data must be wrong and deleted it from the stats.
  • The Linux disk stats (in three different files and four formats depending on the Linux version - great coding guys!!) reports both disk level and disk partition level stats in the same file. nmon just shows you the stats - it is your job to understanding them.
  • nmon does not and with LUNs on SAN disks and software RAID and LVM's it is much safer to show everything.
  • Consider using the nmon feature called "User Defined Disk Groups" to remove the doubling and make disks simpler to understand.
to:
  • Linux
    • nmon collects the data from /proc and displays it.
    • On newer Linux Kernels this is the /proc/diskstats file.
    • It was decided a long time ago that hiding data was a very bad idea as it can go wrong and then be very misleading
      • This is how the ozone hole was missed for 5 years and not detected - the algorithm decided the data must be wrong and deleted it from the stats.
    • The Linux disk stats (in three different files and four formats depending on the Linux version - great coding guys!!) reports both disk level and disk partition level stats in the same file. nmon just shows you the stats - it is your job to understanding them.
    • nmon does not and with LUNs on SAN disks and software RAID and LVM's it is much safer to show everything.
    • Consider using the nmon feature called "User Defined Disk Groups" to remove the doubling and make disks simpler to understand.
Changed lines 441-451 from:

AIX

  • nmon now collects the adapter data from AIX libperfstat.
  • This is the addition of the disk stats added up by knowing which disk is connected to which adapter.
  • This of course, is complex for multipath IO disks.
  • AIX seems to build this map from the order in which disks are discovered rather than used.
  • Depending on your initial setup it can often mean that most disks are assigned the first one or two adapters.
  • Sorry, there is nothing that nmon can do about this.
  • To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 (and onwards) binaries from the Roll Your Own Wiki page at ryo
  • Consider using the nmon feature called "User Defined Disk Groups" to remove the doubling and make disks simpler to understand.
to:
  • AIX
    • nmon now collects the adapter data from AIX libperfstat.
    • This is the addition of the disk stats added up by knowing which disk is connected to which adapter.
    • This of course, is complex for multipath IO disks.
    • AIX seems to build this map from the order in which disks are discovered rather than used.
    • Depending on your initial setup it can often mean that most disks are assigned the first one or two adapters.
    • Sorry, there is nothing that nmon can do about this.
    • To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 (and onwards) binaries from the Roll Your Own Wiki page at ryo
    • Consider using the nmon feature called "User Defined Disk Groups" to remove the doubling and make disks simpler to understand.
Changed lines 490-493 from:
  • Ganglia
  • LPAR2RRD
to:
  • Ganglia
  • LPAR2RRD
Deleted line 547:
Deleted line 568:
Added line 570:

Changed lines 615-619 from:

AIX

  • Not in AIX 4 - there are no adapter stats in this AIX.
  • This is now available in AIX 5 and higher via the libperfstat library so programmers can get this information - but a warning this is derived data from the connected disks (NOT tape drives) because there is no adapter stats. XXX
to:
  • AIX
    • Not in AIX 4 - there are no adapter stats in this AIX.
    • This is now available in AIX 5 and higher via the libperfstat library so programmers can get this information - but a warning this is derived data from the connected disks (NOT tape drives) because there is no adapter stats. XXX
Changed lines 661-666 from:

AIX

  • Correct, this data is not available on AIX 5.1 from the libperfstat library.
  • This also causes a problem on nmon2rrd version 10 where it expects the IOADAPT section and crashes.
  • Recommended action upgrade AIX as 5.1 is not supported without purchasing extended support.
to:
  • AIX
    • Correct, this data is not available on AIX 5.1 from the libperfstat library.
    • This also causes a problem on nmon2rrd version 10 where it expects the IOADAPT section and crashes.
    • Recommended action upgrade AIX as 5.1 is not supported without purchasing extended support.
Changed line 704 from:

[@

to:
  • [@
November 22, 2016, at 05:19 AM by 127.0.0.1 -
Changed line 20 from:
  • Question 3: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?
to:
Changed lines 23-24 from:
  • Question 6: Can I get the adapters stats from other tools?
  • Question 7: When I start nmon 9 on a system that it use to run fine I know get an error message?
to:
Changed lines 25-26 from:
  • Question 9: Can you add the monitoring of process priority?
  • Question 10: on AIX, nmon 9 does not run, please fix?
to:
Changed line 36 from:
  • Question 20: nmon crashes after about 200 snapshots on AIX?
to:
Changed line 40 from:
  • Question 25: nmon and AIX commands do not agree?
to:
Changed line 47 from:
  • Question 32: 0509-036 Cannot load program /usr/lib/drivers/nfs_kdes.ext ?
to:
Changed line 58 from:
  • Question 43: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?
to:
Changed line 65 from:
  • Question 50: nmon will not start on AIX 5.1 due to a libperfstat error?
to:
Changed lines 69-71 from:
  • Question 54: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"
  • Question 54: AIX 5.3 updated but then nmon gives "Assert Failure"
  • Question 55: On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats
to:
Changed line 72 from:
  • Question 57: Why is the Process memory percentage zero? (same for System and User percent)
to:
November 22, 2016, at 05:16 AM by 127.0.0.1 -
Deleted lines 99-105:

Question 3: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?

  • Hard luck
  • I will actively help get AIX 5 bugs fixed but older versions are very much less interesting.
  • In particular, on AIX 4.1.5 the TOP processes does not work but I am not going to fix it unless some one offers me hard currency
Deleted lines 120-133:

Question 6: Can I get the adapters stats from other tools?

AIX

  • Not in AIX 4 - there are no adapter stats in this AIX.
  • This is now available in AIX 5 and higher via the libperfstat library so programmers can get this information - but a warning this is derived data from the connected disks (NOT tape drives) because there is no adapter stats. XXX

Question 7: When I start nmon 9 on a system that it use to run fine I know get an error message?

  • The error is something about "lslpp" AIX 5.1 about ML03 onwards - or - WLM stats go missing - after upgrading to AIX 5.2 ML5 - can you fix nmon?
  • These are bugs in AIX and not nmon -there are fixes available.
  • Please report these problems to your AIX support channel and not me. nmon 10 has also been back ported to AIX 5.1 and AIX 5.2 and has code to work around these bugs and can be used instead of nmon9a.
Deleted lines 129-142:

Question 9: Can you add the monitoring of process priority?

  • Available from the AIX 5.1 onwards

Question 10: nmon on AIX, nmon 9 does not run, please fix?

  • With reports like:
  • read error: No such device or address
  • nmon file=nmon.c line=1278 version=XXX
  • In 95% of the time it is because AIX was upgraded or a maintenance level added but the AIX/system was not rebooted. It is very easy to miss the "You must reboot" message in the gallons of installp output. The reboot is required because the AIX kernel image has been updated and the reboot is the only way to activate the new /unix file. nmon reads the /unix file to find kernel data structure addresses but if the /unix file does no match what is actually running, you get this message.

You can also get really weird effects, if you have messed up LIBPATH.

Deleted lines 215-219:

Question 20: Very old question about nmon 10 and WPAR stats removed

  • Removed
Deleted lines 237-245:

Question 25: Old nmon version question: nmon and AIX commands do not agree?

  • A lot of this happens with nmon 10 and the Shared Processor Logical Partitions (SPLPAR) - what marketing calls Micro-partition.
  • Some of it is because the AIX commands are very unclear about what they are reporting.
  • What was CPU numbers can now be physical CPU, Logical CPU or Virtual CPU numbers and the documentation is unclear.
  • So you may not be comparing "like with like". This has been improved in nmon 11 - please report further issues from nmon 11 onwards.
  • also see question 26.
Deleted lines 290-294:

Question 32: Very old nmon version for AIX: question about NFS driver failures removed

  • Removed
Deleted lines 421-428:

Question 43: Old nmon for AIX: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?

AIX

  • Correct, this data is not available on AIX 5.1 from the libperfstat library.
  • This also causes a problem on nmon2rrd version 10 where it expects the IOADAPT section and crashes.
  • Recommended action upgrade AIX as 5.1 is not supported without purchasing extended support.
Changed lines 495-507 from:

Question 50: nmon for AIX will not start on AIX 5.1 due to a libperfstat error?

  • The error is something like:
    • exec(): 0509-036 Cannot load program <nmon binary file here> because of the following errors:
    • 0509-150 Dependent module libperfstat.a(shr.o) could not be loaded.
    • 0509-022 Cannot load module libperfstat.a(shr.o).
    • 0509-026 System error: A file or directory in the path name does not exist.
  • You will need to have installed the libperfstat library from the AIX CDROMs.
  • This is in bos.perf.libperfstat package.
  • I hope you realise that AIX 5.1 is not normally supported as it is so old.
to:
Changed lines 550-599 from:

Question 54: Old nmon version: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"

  • This has been reported shortly after an upgrade to a AIX 5.3 higher ML (like ML5 or ML6) and reboot.
  • After a lot of research and experiments the following was found by a persistent nmon user called Xi Chen.
  • The problem seems to be nmon jumping to a library like libperfstat and the jump vectors are not right so the library/system call jumps to address zero and attempts to execute instruction zero (invalid, of course).
  • This is a bug in AIX and its update process where the libperfstat kernel package does not match the library.
  • Try the following command: # lslpp -L | grep -i perfstat
  • You may get something like:
# lslpp -L | grep -i perfstat
  bos.perf.libperfstat      5.3.0.50    C     F    Performance Statistics Library
  bos.perf.perfstat         5.3.0.60    C     F    Performance Statistics
  • Update the package bos.perf.libperfstat to the same (5.3.0.60) or at least much closer levels (like 5.3.0.60 and 5.3.0.61) as bos.perf.perfstat. Preferably, the latest available levels.

[+Question 54: Old AIX version: AIX 5.3 updated but then nmon gives "Assert Failure"+#

  • This has been reported shortly after an upgrade - some machines have this problems while others don't.
  • There does not seem to be a pattern. There has been a lot of investigation of this issue with tools being written but it is still a mystery.
  • The libperfstat library is claiming that an invalid parameter has been passed but tools have shown this is not true.
  • The three parameters are a pointer to memory (just malloc'ed in the code), the number of adapters (just returned by the previous call to libperfstat) and the size of the diskadapter structure (which has never changed). The output looks like this:
ERROR: Assert Failure in file="nmon11.c" in function="main" at line=3300
ERROR: Reason=System call returned -1
ERROR: Expression=[[perfstat_diskadapter((perfstat_id_t * )FIRST_DISKADAPTER, p->adapt, sizeof(perfstat_diskadapter_t), adapters)]]
ERROR: errno=22
ERROR: errno means : Invalid argument
  • Then it has been found that a reboot fixes most of these Assert Failures. We don't fully understand this but it may be adapters in funny states, or kernel modules need to be reloaded or libperfstat in a twist - one thing we do know - its not nmon! If you hit this problem:
  1. Check the software levels, see Question 53
  2. Do you think that you rebooted after the upgrade or do you know for absolutely sure!!
  3. Try: export NMON_IGNORE_ASSERT=1 and then start nmon from this same ksh. This may work around the problem as nmon bravely tries to carry on even with library errors.
  4. Try the latest beta version of nmon (if it supports your AIX level).
  5. I know rebooting can be a problem with production systems but it fixes this the vast majority of the time.
  6. If still its a problem, let us know via the usual AIX Performance Tools Forum.

Question 55: Old AIX version:On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats?

  • This is yet another bug in the AIX libperfstat library at this ML6. The NFS data returned to nmon is corrupt and these characters may be output directly from the library (very bad form chaps!). The work around is:
  1. Do not include NFS statistics (remove the -N)
  2. Move to nmon12 that codes around these bugs.
to:
Changed lines 572-579 from:

Question 57: Old AIX version: Why is the Process memory percentage zero? (same for System and User percent)

  • This seems to happen in AIX 5.3 TL07 or there about. In fact, it is the AIX libperfstat library, which nmon uses, that has a bug in it that returns a large negative number for the Process% value. The Process, System and User Percentages are approximations (remember memory has many modes, types and uses and some overlap) and the calculation goes wrong.
  • nmon reports this problem by showing 0% - which is clearly impossible.
  • The bug was very hard to reproduce and track down because the problem only happens in particular circumstances and changes in memory use (like starting and stopping large memory applications). I am pretty sure you have a good chance of the number being fixed (for at least some time but may reappear), if you reboot the machine/LPAR.
  • The fix is to update AIX to AIX 5.3 TL09 (or even better AIX 6) but there may be a PTF or efix. You will have to ask AIX Support by asking for a fix to the libperfstat library to fix the real_system, real_process and real_user members of the perfstat_memory_total_t structure. That will give them the right details to search for in the Retain database. Do not ask for nmon classic support as the answer could be short and/or rude!
  • In my experience AIX systems administrators don't like adding these updates to a production machine. So it may be better to just accept that if any of these numbers are zero then do not use any of these percentages.
to:
Added lines 603-604:
Changed line 606 from:

Older nmon and AIX questions

to:

Very Old nmon and Very old AIX questions = Historic Interest Only

Added line 608:
Added lines 683-700:

Question 54: Old nmon version: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"

  • This has been reported shortly after an upgrade to a AIX 5.3 higher ML (like ML5 or ML6) and reboot.
  • After a lot of research and experiments the following was found by a persistent nmon user called Xi Chen.
  • The problem seems to be nmon jumping to a library like libperfstat and the jump vectors are not right so the library/system call jumps to address zero and attempts to execute instruction zero (invalid, of course).
  • This is a bug in AIX and its update process where the libperfstat kernel package does not match the library.
  • Try the following command: # lslpp -L | grep -i perfstat
  • You may get something like:
# lslpp -L | grep -i perfstat
  bos.perf.libperfstat      5.3.0.50    C     F    Performance Statistics Library
  bos.perf.perfstat         5.3.0.60    C     F    Performance Statistics
  • Update the package bos.perf.libperfstat to the same (5.3.0.60) or at least much closer levels (like 5.3.0.60 and 5.3.0.61) as bos.perf.perfstat. Preferably, the latest available levels.
November 22, 2016, at 05:10 AM by 127.0.0.1 -
Changed line 144 from:
  1. nmon crashes have it starts in collecting to a file mode.
to:
  1. nmon crashes have it starts in collecting to a file mode. See question 2.
Changed lines 207-208 from:
  • nmon for AIX is a fully supported AIX command so you can raise a IBM Problem report (PMR). However, you can't really ask for help with post-processing graphing tools that are not part of AIX.
to:
  • nmon for AIX is a fully supported AIX command so you can raise a IBM Problem report (PMR). However, you can't really ask for help with post-processing graphing tools that are not part of AIX.
Changed line 225 from:
  • The -f, -F, -x, -X or -z MUST be the first option on the line and only one of them.
to:
  • The -f, -F, -x, -X or -z MUST be the first option on the line and only one of them.
Added line 255:
Changed lines 270-271 from:

Question 24: what are NANQ and INF?

to:

Question 24: What are NANQ and INF?

Changed lines 275-277 from:
  • when nmon uses printf to display the invalid number it outputs these strings instead.
to:
  • When nmon uses printf to display the invalid number it outputs these strings instead.
Changed lines 337-338 from:
to:
  • Of course, POWER6, POWER7 and POWER8 machines have higher SMT levels.
Deleted line 360:
Deleted line 361:
Deleted line 362:
Changed lines 375-376 from:

[+Question 36: '''How can I set numperm better?

to:

Question 36: How can I set numperm better?

Changed line 574 from:

[@

to:
  • [@
Changed line 616 from:

[@

to:
  • [@
November 22, 2016, at 05:04 AM by 127.0.0.1 -
Added line 476:
Added line 666:
Added lines 733-851:

Older nmon and AIX questions


Question 3: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?

  • Hard luck
  • I will actively help get AIX 5 bugs fixed but older versions are very much less interesting.
  • In particular, on AIX 4.1.5 the TOP processes does not work but I am not going to fix it unless some one offers me hard currency

Question 6: Can I get the adapters stats from other tools?

AIX

  • Not in AIX 4 - there are no adapter stats in this AIX.
  • This is now available in AIX 5 and higher via the libperfstat library so programmers can get this information - but a warning this is derived data from the connected disks (NOT tape drives) because there is no adapter stats. XXX

Question 7: When I start nmon 9 on a system that it use to run fine I know get an error message?

  • The error is something about "lslpp" AIX 5.1 about ML03 onwards - or - WLM stats go missing - after upgrading to AIX 5.2 ML5 - can you fix nmon?
  • These are bugs in AIX and not nmon -there are fixes available.
  • Please report these problems to your AIX support channel and not me. nmon 10 has also been back ported to AIX 5.1 and AIX 5.2 and has code to work around these bugs and can be used instead of nmon9a.

Question 9: Can you add the monitoring of process priority?

  • Available from the AIX 5.1 onwards

Question 10: nmon on AIX, nmon 9 does not run, please fix?

  • With reports like:
  • read error: No such device or address
  • nmon file=nmon.c line=1278 version=XXX
  • In 95% of the time it is because AIX was upgraded or a maintenance level added but the AIX/system was not rebooted. It is very easy to miss the "You must reboot" message in the gallons of installp output. The reboot is required because the AIX kernel image has been updated and the reboot is the only way to activate the new /unix file. nmon reads the /unix file to find kernel data structure addresses but if the /unix file does no match what is actually running, you get this message.

You can also get really weird effects, if you have messed up LIBPATH.

Question 20: Very old question about nmon 10 and WPAR stats removed

  • Removed

Question 25: Old nmon version question: nmon and AIX commands do not agree?

  • A lot of this happens with nmon 10 and the Shared Processor Logical Partitions (SPLPAR) - what marketing calls Micro-partition.
  • Some of it is because the AIX commands are very unclear about what they are reporting.
  • What was CPU numbers can now be physical CPU, Logical CPU or Virtual CPU numbers and the documentation is unclear.
  • So you may not be comparing "like with like". This has been improved in nmon 11 - please report further issues from nmon 11 onwards.
  • also see question 26.

Question 32: Very old nmon version for AIX: question about NFS driver failures removed

  • Removed

Question 43: Old nmon for AIX: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?

AIX

  • Correct, this data is not available on AIX 5.1 from the libperfstat library.
  • This also causes a problem on nmon2rrd version 10 where it expects the IOADAPT section and crashes.
  • Recommended action upgrade AIX as 5.1 is not supported without purchasing extended support.

Question 50: nmon for AIX will not start on AIX 5.1 due to a libperfstat error?

  • The error is something like:
    • exec(): 0509-036 Cannot load program <nmon binary file here> because of the following errors:
    • 0509-150 Dependent module libperfstat.a(shr.o) could not be loaded.
    • 0509-022 Cannot load module libperfstat.a(shr.o).
    • 0509-026 System error: A file or directory in the path name does not exist.
  • You will need to have installed the libperfstat library from the AIX CDROMs.
  • This is in bos.perf.libperfstat package.
  • I hope you realise that AIX 5.1 is not normally supported as it is so old.

[+Question 54: Old AIX version: AIX 5.3 updated but then nmon gives "Assert Failure"+#

  • This has been reported shortly after an upgrade - some machines have this problems while others don't.
  • There does not seem to be a pattern. There has been a lot of investigation of this issue with tools being written but it is still a mystery.
  • The libperfstat library is claiming that an invalid parameter has been passed but tools have shown this is not true.
  • The three parameters are a pointer to memory (just malloc'ed in the code), the number of adapters (just returned by the previous call to libperfstat) and the size of the diskadapter structure (which has never changed). The output looks like this:
ERROR: Assert Failure in file="nmon11.c" in function="main" at line=3300
ERROR: Reason=System call returned -1
ERROR: Expression=[[perfstat_diskadapter((perfstat_id_t * )FIRST_DISKADAPTER, p->adapt, sizeof(perfstat_diskadapter_t), adapters)]]
ERROR: errno=22
ERROR: errno means : Invalid argument
  • Then it has been found that a reboot fixes most of these Assert Failures. We don't fully understand this but it may be adapters in funny states, or kernel modules need to be reloaded or libperfstat in a twist - one thing we do know - its not nmon! If you hit this problem:
  1. Check the software levels, see Question 53
  2. Do you think that you rebooted after the upgrade or do you know for absolutely sure!!
  3. Try: export NMON_IGNORE_ASSERT=1 and then start nmon from this same ksh. This may work around the problem as nmon bravely tries to carry on even with library errors.
  4. Try the latest beta version of nmon (if it supports your AIX level).
  5. I know rebooting can be a problem with production systems but it fixes this the vast majority of the time.
  6. If still its a problem, let us know via the usual AIX Performance Tools Forum.

Question 55: Old AIX version:On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats?

  • This is yet another bug in the AIX libperfstat library at this ML6. The NFS data returned to nmon is corrupt and these characters may be output directly from the library (very bad form chaps!). The work around is:
  1. Do not include NFS statistics (remove the -N)
  2. Move to nmon12 that codes around these bugs.

Question 57: Old AIX version: Why is the Process memory percentage zero? (same for System and User percent)

  • This seems to happen in AIX 5.3 TL07 or there about. In fact, it is the AIX libperfstat library, which nmon uses, that has a bug in it that returns a large negative number for the Process% value. The Process, System and User Percentages are approximations (remember memory has many modes, types and uses and some overlap) and the calculation goes wrong.
  • nmon reports this problem by showing 0% - which is clearly impossible.
  • The bug was very hard to reproduce and track down because the problem only happens in particular circumstances and changes in memory use (like starting and stopping large memory applications). I am pretty sure you have a good chance of the number being fixed (for at least some time but may reappear), if you reboot the machine/LPAR.
  • The fix is to update AIX to AIX 5.3 TL09 (or even better AIX 6) but there may be a PTF or efix. You will have to ask AIX Support by asking for a fix to the libperfstat library to fix the real_system, real_process and real_user members of the perfstat_memory_total_t structure. That will give them the right details to search for in the Retain database. Do not ask for nmon classic support as the answer could be short and/or rude!
  • In my experience AIX systems administrators don't like adding these updates to a production machine. So it may be better to just accept that if any of these numbers are zero then do not use any of these percentages.
November 22, 2016, at 03:36 AM by 127.0.0.1 -
Changed line 13 from:
  • Answer in BLACK apply to both versions
to:
  • Answers in BLACK apply to both versions
November 22, 2016, at 03:36 AM by 127.0.0.1 -
Changed lines 543-578 from:

This is in addition to the data management problem.

Due to these three problems:

    Data overload - to many data points
    Averaging out - eliminates the vital data
    Manipulation - the data will need to be stored, manipulated and displayed - non-trivial

I think many people make the mistake that this long term reports from nmon is an easy task but it will turn out to be very hard work and often the results are utterly pointless or meaningless.

If you must attempt this then I recommend:

    rrdtool to summarise data for you and draw graphs
    ploticus looks like a good tool
    take a look at Ganglia

Question 50: nmon will not start on AIX 5.1 due to a libperfstat error?

The error is something like: exec(): 0509-036 Cannot load program <nmon binary file here> because of the following errors: 0509-150 Dependent module libperfstat.a(shr.o) could not be loaded. 0509-022 Cannot load module libperfstat.a(shr.o). 0509-026 System error: A file or directory in the path name does not exist.

You will need to have installed the libperfstat library from the AIX CDROMs. This is in bos.perf.libperfstat package.

I hope you realise that AIX 5.1 is not normally supported without extra payments as it is so old. Question 51: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?

Here is a Korn shell script that shows you where to get the data and the maths involved.

to:
  • This is in addition to these 3 data management problem:
    1. Data overload - to many data points
    2. Averaging out - eliminates the vital data
    3. Manipulation - the data will need to be stored, manipulated and displayed - non-trivial
  • I think many people make the mistake that this long term reports from nmon is an easy task but it will turn out to be very hard work and often the results are utterly pointless or meaningless.
  • If you must attempt this then I recommend:
    • rrdtool to manage and aggregate the data for you and draw graphs
    • nmon2web
  • or non nmon data based take a look at
  • Ganglia
  • LPAR2RRD

Question 50: nmon for AIX will not start on AIX 5.1 due to a libperfstat error?

  • The error is something like:
    • exec(): 0509-036 Cannot load program <nmon binary file here> because of the following errors:
    • 0509-150 Dependent module libperfstat.a(shr.o) could not be loaded.
    • 0509-022 Cannot load module libperfstat.a(shr.o).
    • 0509-026 System error: A file or directory in the path name does not exist.
  • You will need to have installed the libperfstat library from the AIX CDROMs.
  • This is in bos.perf.libperfstat package.
  • I hope you realise that AIX 5.1 is not normally supported as it is so old.

Question 51: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?

Linux

  • Here is a Korn shell script that shows you where to get the data and the maths involved.

[@

Changed line 594 from:
to:

@]

Changed lines 596-608 from:

Question 52: The Disk Busy stats are missing on AIX

If you are watching this on line it will be flashing

To enable disk stats as root: chdev -l sys0 -a iostat=true

at you - this is a big hint on how to switch them on !!! Question 53: Sort order problems with massive nmon output files.

So you collected more than 9999 snapshots in a single nmon capture. Ignoring the fact that the Excel Analyser can't cope with all this data and it makes the data unmanageable. We suggest a good aim is between 400 and 700 snapshots per file for good graphs and manageable file sizes. Anyway, you then find out that if you sort the file the rows don't even sort in the right order. The problem is you have four digit and five digit Timeshot numbers - the T numbers. This mucks up the sort ordering. What can you do? Try this on the AIX system - should work on Linux too, it makes all the T numbers 5 digit and then they can be sorted:

to:

Question 52: The Disk Busy stats are missing on AIX, what do I do?

  • If you are watching this online it will be flashing
    • --> To enable disk stats as root: chdev -l sys0 -a iostat=true
  • at you - this is a big hint on how to switch them on !!!

Question 53: Sort order problems with massive nmon output files?

  • So you collected more than 9999 snapshots in a single nmon capture. Ignoring the fact that the Excel Analyser can't cope with all this data and it makes the data unmanageable.
  • I suggest a good aim is between 400 and 700 snapshots per file for good graphs and manageable file sizes.
  • Anyway, you then find out that if you sort the file the rows don't even sort in the right order.
  • The problem is you have four digit and five digit Timeshot numbers - the T numbers.
  • This mucks up the sort ordering.
  • What can you do?
    1. nmon for AIX add the -w 8 option to the end of the command line this makes the Timestamp string 8 digits wide instead of 4 i.e up to T99999999
    2. nmon for Linux does not have this option yet but could.
  • It you have already collected the data fix the nmon file using the below - it makes all the T numbers 5 digit and then they can be sorted:

[@

Changed lines 618-621 from:

sort -n original5digit.csv >fixed.csv

Full marks if you understand the sed command - this is very advanced regular express stuff

to:

sort -n original5digit.csv >fixed5digit.csv @]

  • Full marks if you understand the sed command - this is very advanced regular express stuff

Question 54: Old nmon version: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"

Changed lines 626-640 from:
to:
  • This has been reported shortly after an upgrade to a AIX 5.3 higher ML (like ML5 or ML6) and reboot.
  • After a lot of research and experiments the following was found by a persistent nmon user called Xi Chen.
  • The problem seems to be nmon jumping to a library like libperfstat and the jump vectors are not right so the library/system call jumps to address zero and attempts to execute instruction zero (invalid, of course).
  • This is a bug in AIX and its update process where the libperfstat kernel package does not match the library.
  • Try the following command: # lslpp -L | grep -i perfstat
  • You may get something like:
# lslpp -L | grep -i perfstat
  bos.perf.libperfstat      5.3.0.50    C     F    Performance Statistics Library
  bos.perf.perfstat         5.3.0.60    C     F    Performance Statistics
  • Update the package bos.perf.libperfstat to the same (5.3.0.60) or at least much closer levels (like 5.3.0.60 and 5.3.0.61) as bos.perf.perfstat. Preferably, the latest available levels.
Changed lines 642-668 from:

Question 54: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"

This has been reported shortly after an upgrade to a AIX 5.3 higher ML (like ML5 or ML6) and reboot. After a lot of research and experiments the following was found by a persistent nmon user called Xi Chen. The problem seems to be nmon jumping to a library like libperfstat and the jump vectors are not right so the library/system call jumps to address zero and attempts to execute instruction zero (invalid, of course). This is a bug in AIX and its update process where the libperfstat kernel package does not match the library. Try the following command: # lslpp -L | grep -i perfstat

You may get something like:

  1. lslpp -L | grep -i perfstat bos.perf.libperfstat 5.3.0.50 C F Performance Statistics Library bos.perf.perfstat 5.3.0.60 C F Performance Statistics

Update the package bos.perf.libperfstat to the same (5.3.0.60) or at least much closer levels (like 5.3.0.60 and 5.3.0.61) as bos.perf.perfstat. Preferably, the latest available levels.

Question 54: AIX 5.3 updated but then nmon gives "Assert Failure"

This has been reported shortly after an upgrade - some machines have this problems while others don't. There does not seem to be a pattern. There has been a lot of investigation of this issue with tools being written but it is still a mystery. The libperfstat library is claiming that an invalid parameter has been passed but tools have shown this is not true. The three parameters are a pointer to memory (just malloc'ed in the code), the number of adapters (just returned by the previous call to libperfstat) and the size of the diskadapter structure (which has never changed). The output looks like this:

to:

[+Question 54: Old AIX version: AIX 5.3 updated but then nmon gives "Assert Failure"+#

  • This has been reported shortly after an upgrade - some machines have this problems while others don't.
  • There does not seem to be a pattern. There has been a lot of investigation of this issue with tools being written but it is still a mystery.
  • The libperfstat library is claiming that an invalid parameter has been passed but tools have shown this is not true.
  • The three parameters are a pointer to memory (just malloc'ed in the code), the number of adapters (just returned by the previous call to libperfstat) and the size of the diskadapter structure (which has never changed). The output looks like this:

[@

Changed lines 654-729 from:

Then it has been found that a reboot fixes most of these Assert Failures. We don't fully understand this but it may be adapters in funny states, or kernel modules need to be reloaded or libperfstat in a twist - one thing we do know - its not nmon! If you hit this problem:

    Check the software levels, see Question 53
    Do you think that you rebooted after the upgrade or do you know for absolutely sure!!
    Try: export NMON_IGNORE_ASSERT=1 and then start nmon from this same ksh. This may work around the problem as nmon bravely tries to carry on even with library errors.
    Try the latest beta version of nmon (if it supports your AIX level).
    I know rebooting can be a problem with production systems but it fixes this the vast majority of the time.
    If still its a problem, let us know via the usual AIX Performance Tools Forum.

Question 55: On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats

This is yet another bug in the AIX libperfstat library at this ML6. The NFS data returned to nmon is corrupt and these characters may be output directly from the library (very bad form chaps!).

The work around is:

    Do not include NFS statistics (remove the -N)
    Move to nmon12 that codes around these bugs.

Question 56: Does nmon capture point in time stats or averages?

Well there are two type of numbers

    rates and
    absolutes.

For an absolute example, free memory is an absolute - nmon just show you how much is memory is free. For a rate example, the network stats are rates, here nmon does the following:

    Capture a complete set of counters - these are incremented by the kernel like the number of bytes sent.
    then nmon waits the number of seconds you asked
    then nmon captures a second set of these counters
    then nmon calculates the difference between the two sets and divides by the number of seconds, so everything is per second
    this number is then displayed on screen or written to the data file

So the rates are the average between the two capture points. As the number of seconds increases the rates get more and more steady but note if you reduce the seconds to just one (the minimum to make sure nmon does not use too much CPU time) you will see lots more peaks and dips in the numbers.

"Point in time" numbers would be very misleading as they would miss all the peaks and dips in between - you would have to take dozens of them to be sure you are really seeing a representative number. Question 57: Why is the Process memory percentage zero? (same for System and User percent)

This seems to happen in AIX 5.3 TL07 or there about. In fact, it is the AIX libperfstat library, which nmon uses, that has a bug in it that returns a large negative number for the Process% value. The Process, System and User Percentages are approximations (remember memory has many modes, types and uses and some overlap) and the calculation goes wrong.

nmon reports this problem by showing 0% - which is clearly impossible.

The bug was very hard to reproduce and track down because the problem only happens in particular circumstances and changes in memory use (like starting and stopping large memory applications). I am pretty sure you have a good chance of the number being fixed (for at least some time but may reappear), if you reboot the machine/LPAR.

The fix is to update AIX to AIX 5.3 TL09 (or even better AIX 6) but there may be a PTF or efix. You will have to ask AIX Support by asking for a fix to the libperfstat library to fix the real_system, real_process and real_user members of the perfstat_memory_total_t structure. That will give them the right details to search for in the Retain database. Do not ask for nmon classic support as the answer could be short and/or rude!

In my experience AIX systems administrators don't like adding these updates to a production machine. So it may be better to just accept that if any of these numbers are zero then do not use any of these percentages. Question 100: When will nmon collect data from lots of machines or LPARs?

Answer: Never. I like to think nmon does one job and does it well - it collects data from one machine and saves it in one file. Going multiple machine or LPAR has many problems:

    Collecting data from lots of machines or LPARs would require network access and lots of error handling for missing or late data.

    The nmon output file would then be far more complex and have to include the machine names and totally rewrite the time stamps.
    We already suffer from too much data than Excel can handle.
    There would simply be too much data to display
    This complication would mean nmon becomes very large and code stability would take a long time to settle down

What you do need is:

    Less data and then you drill down of particular nodes
    Automated database generation to store the data
    Automated graphing of the data you really want
    History for the last hour, day, week, month year
    Small simple daemons on the nodes and automated central collection point
    Simple method of collecting more stats
    Open Source code to make it safe and simple to implement.

This tool is called Ganglia, see http://ganglia.sourceforge.net/ See Question 101 Question 101: When will nmon collect data like "topas -C"?

It may not be obvious but topas and topas -C are two completely different programs hidden in one binary. The cross partition stats involved communicating with each LPAR and the HMC to get the data unlike the local stats that just calls the local kernel API. The cross partition version of nmon has already been written it is called Ganglia please see http://www.ibm.com/collaboration/wiki/display/WikiPtype/ganglia for more details. OK, it is an excellent Open Source tool and nothing to do with nmon but it is has all the right stats, many brilliant features, is very simple to implement and has very little impact on performance. There is no need to duplicate this work and it also supports lots of operating systems, the output is via a website and the data is in graph form and it keeps historic data - so this is better then text output on a dumb screen and only for root users.

to:

@]

  • Then it has been found that a reboot fixes most of these Assert Failures. We don't fully understand this but it may be adapters in funny states, or kernel modules need to be reloaded or libperfstat in a twist - one thing we do know - its not nmon! If you hit this problem:
  1. Check the software levels, see Question 53
  2. Do you think that you rebooted after the upgrade or do you know for absolutely sure!!
  3. Try: export NMON_IGNORE_ASSERT=1 and then start nmon from this same ksh. This may work around the problem as nmon bravely tries to carry on even with library errors.
  4. Try the latest beta version of nmon (if it supports your AIX level).
  5. I know rebooting can be a problem with production systems but it fixes this the vast majority of the time.
  6. If still its a problem, let us know via the usual AIX Performance Tools Forum.

Question 55: Old AIX version:On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats?

  • This is yet another bug in the AIX libperfstat library at this ML6. The NFS data returned to nmon is corrupt and these characters may be output directly from the library (very bad form chaps!). The work around is:
  1. Do not include NFS statistics (remove the -N)
  2. Move to nmon12 that codes around these bugs.

Question 56: Does nmon capture point in time stats or averages?

  • Well there are two type of numbers
    1. rates and
    2. absolutes.
  • For an absolute example, free memory is an absolute
    1. nmon just show you how much is memory is free at that specific point in time.
  • For a rate example, the network stats are rates (so too are CPU and disks KBps), here nmon does the following:
    1. Capture a complete set of counters - these are incremented by the kernel like the number of bytes sent.
    2. then nmon waits the number of seconds you asked
    3. then nmon captures a second set of these counters
    4. then nmon calculates the difference between the two sets and divides by the number of seconds, so everything is per second
    5. this number is then displayed on screen or written to the data file
  • So the rates are the average between the two capture points. As the number of seconds increases the rates get more and more steady.
  • Note if you reduce time btween snapshots i.e. the seconds to just one (the minimum to make sure nmon does not use too much CPU time) you will see lots more peaks and dips in the numbers.
  • "Point in time" numbers for rates would be very misleading as they would miss all the peaks and dips in between - you would have to take dozens of them to be sure you are really seeing a representative number.

Question 57: Old AIX version: Why is the Process memory percentage zero? (same for System and User percent)

  • This seems to happen in AIX 5.3 TL07 or there about. In fact, it is the AIX libperfstat library, which nmon uses, that has a bug in it that returns a large negative number for the Process% value. The Process, System and User Percentages are approximations (remember memory has many modes, types and uses and some overlap) and the calculation goes wrong.
  • nmon reports this problem by showing 0% - which is clearly impossible.
  • The bug was very hard to reproduce and track down because the problem only happens in particular circumstances and changes in memory use (like starting and stopping large memory applications). I am pretty sure you have a good chance of the number being fixed (for at least some time but may reappear), if you reboot the machine/LPAR.
  • The fix is to update AIX to AIX 5.3 TL09 (or even better AIX 6) but there may be a PTF or efix. You will have to ask AIX Support by asking for a fix to the libperfstat library to fix the real_system, real_process and real_user members of the perfstat_memory_total_t structure. That will give them the right details to search for in the Retain database. Do not ask for nmon classic support as the answer could be short and/or rude!
  • In my experience AIX systems administrators don't like adding these updates to a production machine. So it may be better to just accept that if any of these numbers are zero then do not use any of these percentages.

Question 100: When will nmon collect data from lots of machines or LPARs?

  • Answer: Never.
  • I like to think nmon does one job and does it well - it collects data from one Server / virtual machine and saves it in one file.
  • Going multiple machine or LPAR has many problems:
    1. Collecting data from lots of machines or LPARs would require network access and lots of error handling for missing or late data.
    2. The nmon output file would then be far more complex and have to include the machine names and totally rewrite the time stamps.
    3. We already suffer from too much data than Excel can handle.
    4. There would simply be too much data to display
    5. This complication would mean nmon becomes very large and code stability would take a long time to settle down
  • What you do need is:
    1. Less data and then you drill down of particular nodes
    2. Automated database generation to store the data
    3. Automated graphing of the data you really want
    4. History for the last hour, day, week, month year
    5. Small simple daemons on the nodes and automated central collection point
    6. Simple method of collecting more stats
    7. Open Source code to make it safe and simple to implement.
  • This tool is called Ganglia, see http://ganglia.sourceforge.net/ See Question 101

Question 101: When will nmon collect data like the AIX "topas -C"?

  • It may not be obvious but topas and topas -C are two completely different programs hidden in one binary.
  • The cross partition stats involved communicating with each LPAR and the HMC to get the data unlike the local stats that just calls the local kernel API.
  • The cross partition version of nmon has already been written it is called Ganglia please see http://www.ibm.com/collaboration/wiki/display/WikiPtype/ganglia for more details. OK, it is an excellent Open Source tool and nothing to do with nmon but it is has all the right stats, many brilliant features, is very simple to implement and has very little impact on performance.
  • There is no need to duplicate this work and it also supports lots of operating systems, the output is via a website and the data is in graph form and it keeps historic data - so this is better then text output on a dumb screen and only for root users.
 - - - The End - - -
November 22, 2016, at 02:45 AM by 127.0.0.1 -
Changed lines 11-12 from:
  • Answers in %redRED are for very old versions of nmon for AIX
  • Answers in %blueBLUE are for nmon for Linux
to:
  • Answers in RED are for very old versions of nmon for AIX
  • Answers in BLUE are for nmon for Linux
Changed lines 486-518 from:

This is the character I/O that a process is generating and it is counted from calls to the read() and write() systems calls. I/O started in other ways like Async I/O (commonly used by an RDBMS), paging or memory mapped files are not included. The number fetch from the AIX kernel using the getprocs64() system call and the structure found in /usr/include/procinfo.h - look for the pi_ioch variable.

Question 45: On Linux the disk stats are all doubled?

nmon collects the data from /proc and displays it. On newer Kernels this is ht e/proc/diskstats file. It was decided a long time ago that hiding data was a very bad idea as it can go wrong and then be very misleading - this is how the ozone hole was missed for 5 years and not detected - the algorithm decided the data must be wrong and deleted it from the stats. The Linux disk stats (in three different files and four formats depending on the Linux version - great coding guys!!) reports both disk level and disk partition level stats in the same file. nmon just shows you the stats - it is your job to understanding them. nmon does not and with LUNs on SAN disks and software RAID and LVM's it is much safer to show everything. Question 46: On AIX the disk seem to be mostly on the first adapter?

nmon now collects the adapter data from AIX libperfstat. This is the addition of the disk stats added up by knowning which disk is conected to which adapter. This of course, is complex for mutlipath IO disks. AIX seems to build this map from the order in which disks are discovered rather than used. Depending on your initial setup it can often mean that most disks are assigned the first one or two adapters. Sorry, there is nothing that nmon can do about this. To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 binaries from the Roll Your Own Wiki page at ryo Question 47: On nmon for Linux the CPU Wait for IO number is zero or odd?

This number is not available in the /proc filesystem until the 2.6 kernel and then it appears in the undocumented fields at the end of a line - I have fixed this for the 2.6 kernels in nmon for Linux version 11c. Question 48: On nmon for Linux the paging details are missing and the PAGE lines for the capture to file are missing.

This data was very hard to locate and now appear in nmon for Linux version 11d onward for the 2.6 kernel. Before this kernel version the data is not present in /proc. Question 49: I want to collect data every second and then see weekly and monthly reports. How?

Let us take this in simple bite-size chunks:

    First, a piratical point, most Laptop and PC screens are 1024x x768 pixels. The point is that no matter how many data points you have you can not even see a maximum of about 800 data points. This is why I recommend about 300 to 400 data captures with nmon to get good looking graphs.
    Second, one second stats for a day give you (60 x 60 x 24) 86400 data points! So OK let us try one minute stats then we have 1440 data points, which is still to many. So we need to move to 5 minutes captures and we get to a sensible 288 data points and a good looking graph.
    Third, we then collect data for a month 288 x 31 = 8928 data points - oh dear far to may data points again!! so now we have to drop down to once an hour data capture (24 x 31) and we hav 741 data points which is only just possible - we had better start thinking about the purchase of a bigger screen.
    If you then want to compare months or have a yearly report ... well you get the idea by now, we are now monitoring 12 hour periods.

But the above is only a physical problem. The much larger logical problem is still there to catch you out and that problem is averaging out. A long time ago I noticed that the shorter the time period that you use to monitor the more fluxuations you notice in the data.

Philosophy: If you keep using shorter and shorter periods you will eventually see that the CPUs are either 100% busy or 100% idle all the other numbers are just a feature of humans not thinking fast enough and having to average out the CPU use in longer periods.

Anyway, for performance tuning we need to concentrate on the peaks. Take a look at the below graph:

If we average the whole day we get 50% which completely hid the peaks of the data time and the heavy CPU load during the evening batch. If this computer was not used during Saturday and Sunday the average might come down to 35%. The point is averaging data over longer periods removed all the important peaks.

to:
  • This is the Character I/O that a process is generating and it is counted from calls to the read() and write() systems calls.
  • This will include I/O to files, terminals (now rare), FIFO, pipes and network sockets.
  • I/O started in other ways like Async I/O (commonly used by an RDBMS), paging or memory mapped files are not included.
  • The number fetched from the AIX kernel using the getprocs64() system call and the structure found in /usr/include/procinfo.h - look for the pi_ioch variable.

Question 45: On Linux the disk stats are all doubled?

Linux

  • nmon collects the data from /proc and displays it.
  • On newer Linux Kernels this is the /proc/diskstats file.
  • It was decided a long time ago that hiding data was a very bad idea as it can go wrong and then be very misleading
    • This is how the ozone hole was missed for 5 years and not detected - the algorithm decided the data must be wrong and deleted it from the stats.
  • The Linux disk stats (in three different files and four formats depending on the Linux version - great coding guys!!) reports both disk level and disk partition level stats in the same file. nmon just shows you the stats - it is your job to understanding them.
  • nmon does not and with LUNs on SAN disks and software RAID and LVM's it is much safer to show everything.
  • Consider using the nmon feature called "User Defined Disk Groups" to remove the doubling and make disks simpler to understand.

Question 46: On AIX the disk seem to be mostly on the first adapter?

AIX

  • nmon now collects the adapter data from AIX libperfstat.
  • This is the addition of the disk stats added up by knowing which disk is connected to which adapter.
  • This of course, is complex for multipath IO disks.
  • AIX seems to build this map from the order in which disks are discovered rather than used.
  • Depending on your initial setup it can often mean that most disks are assigned the first one or two adapters.
  • Sorry, there is nothing that nmon can do about this.
  • To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 (and onwards) binaries from the Roll Your Own Wiki page at ryo
  • Consider using the nmon feature called "User Defined Disk Groups" to remove the doubling and make disks simpler to understand.

Question 47: On nmon for Linux the CPU Wait for IO number is zero or odd?

  • This number is not available in the /proc filesystem until the 2.6 kernel and then it appears in the undocumented fields at the end of a line - I have fixed this for the 2.6 kernels in nmon for Linux version 11c onwards.

Question 48: nmon for Linux has paging details missing and the PAGE lines for the capture to file are missing.

  • This data was very hard to locate and now appear in nmon for Linux version 11d onward for the 2.6 kernel.
  • Before this kernel version the data is not present in /proc.

Question 49: I want to collect data every second and then see weekly and monthly reports. How?

  • Let us take this in simple bite-size chunks:
    1. First, a practical point, most Laptop and PC screens are 1024x x768 pixels (or about 1.5 times that). The point is that no matter how many data points you have you can not even see a maximum of about 800 data points. This is why I recommend about 300 to 400 data captures with nmon to get good looking graphs.
    2. Second, one second stats for a day give you (60 x 60 x 24) 86400 data points! So OK let us try one minute stats then we have 1440 data points, which is still to many. So we need to move to 5 minutes captures and we get to a sensible 288 data points and a good looking graph.
    3. Third, we then collect data for a month 288 x 31 = 8928 data points - oh dear far to may data points again!! so now we have to drop down to once an hour data capture (24 x 31) and we have 741 data points which is possible - we had better start thinking about the purchase of a bigger screen.
    4. If you then want to compare months or have a yearly report ... well you get the idea by now, we are now monitoring 12 hour periods.
  • But the above is only a physical problem. The much larger logical problem is still there to catch you out and that problem is averaging out.
  • A long time ago I noticed that the shorter the time period that you use to monitor the more fluctuations you notice in the data.
  • Philosophy: If you keep using shorter and shorter periods you will eventually see that the CPUs are either 100% busy or 100% idle all the other numbers are just a feature of humans not thinking fast enough and having to average out the CPU use in longer periods.
  • Anyway, for performance tuning we need to concentrate on the peaks. Take a look at the below graph:
  • If we average the whole day we get 50% which completely hid the peaks of the data time and the heavy CPU load during the evening batch. If this computer was not used during Saturday and Sunday the average might come down to 35%. The point is averaging data over longer periods removed all the important peaks.
November 22, 2016, at 01:56 AM by 127.0.0.1 -
Changed lines 183-188 from:

@@

    mkfifo /tmp/xyz
    nmon -F /tmp/xyz s 5 c 300
    your-command </tmp/xyz

@@

to:
    mkfifo /tmp/xyz
    nmon -F /tmp/xyz s 5 c 300
    your-command </tmp/xyz
Changed lines 352-465 from:

This question is asked a lot and it can mean your CPUs are actually too fast!

CPU "waiting for I/O" state and utilisation numbers (as opposed to User, System and Idle) means the CPU is Idle but has a disk I/O outstanding. In history this was used to highlight that your application is being held up by slow disks or disks problems. In the Wait for I/O state the CPU is actually free to do other work and the CPU is NOT looping waiting for the disk - it in fact actioned the adapter to perform the disk I/O, put the calling process to sleep and carried on. If there is no other process it is in the same loop as in the Idle state i.e. it is available to do other things. In AIX the processor does one of two things

    in regular stand-alone machines or a dedicate CPU LPAR the process runs a special kernel level process called "wait" from which it can exit very quickly at the arrival of the next interrupt
    In a micro-partition (Shared Processor LPAR) the processor after a few micro seconds will call the Hypervisor to yield the processor for other LPARs

In benchmarks, Wait for I/O is seen positively as an opportunity - we can do throw in more work to boost throughput.

Any workload in which the CPU does comparatively little work compared to the volume of disk I/O is going to give you high Wait for I/O.

If this high Wait for I/O is a sudden change from the normal pattern then it needs investigating and you should make sure as many disks as possible are involved in the disk I/O.

But lots of workloads just run like this - a common example I come across regularly is SAP databases. SAP cleverly caches lots of data but on large database it has to do lots of disk I/O for particular customer or whatever records. Once the data is available it is sent to the SAP application servers i.e. little work is done on the database.

In fact, faster CPUs would mean even high wait values.

Question 35: On AIX, free memory is near zero, how do I free more memory?

This is just how AIX works and is perfectly normal. All of memory will be soaked up with copies of filesystem blocks after a reasonable length of time and the free memory will be near zero. AIX will then use the lrud process to keep the free list at a reasonable level. If you see the lrud process taking more than 30% of a CPU then you need to investigate and make memory parameter changes. Question 36: How can I set numperm better?

You can't. This number just reflects the amount of memory being used for disk blocks - called the buffer cache. It is controlled by three parameters minperm, maxperm and strictperm but these set thresholds and algorithms. The actual numperm number reflects what is actually going on. You will have to find other places for tuning these parameters as it is beyond the scope of this FAQ.

It is also worth noting that the nmon values for numperm and maxperm are based on a percentage of physical memory. The AIX commands report a percentage but not of all memory - they seem to remove some memory that might be something like the memory allocated to the AIX kernel (i.e. it could never be used as cache). Unfortunately this is not documented and the memory size not counted is not available with any public API. So nmon does the best it can but the numbers will not be absolutely the same. Question 37: What format is the nmon output file?

Plain ASCII text that you can edit and editable with vi (but you might hit the 2048 byte line limit on the AIX vi). I use the Open Source vim on AIX to avoid this or do it on Linux.

    The first token on the line tells you what sort of data it is
        AAA lines are basic nmon data about this collection of data
        BBB lines are about the configuration of the machine
        ZZZ lines include the date and time stamp stored here once ro reduce output
        others should be obvious
    the second field is the Timestamp - see the ZZZ section to the actual time
    then there is the data
    each sort of data (CPU, DISK, etc.) has a Header line that describes the columns and the header lines also include the graph titles

You do not need to sort the nmon output file for nmon2rrd or the Analyser but it you do then you can see the sections easier for editing. Question 38: I have collected once a second for 8 hours but I can't get the Analyser to work?

You have 28800 data points and you want to see this on a screen with say 1024 pixels wide !!

    that is 29 data points per pixel.

My new Thinkpad has 1400 pixels across the screen, so I am down to just 18 data points per pixel

    what where you thinking !!

I think even with the best will in the world, the analyser spreadsheet is going to struggle. On a tiny machine you get about 1.5KB per snapshot and a normal size machine with a few nmon options it is more like 60KB each. At 60KB the maths --> 28800*60KB = 1.6GB. How big is your output file? I hope you have at least 4 GBs of memory in your PC to handle this!

As I hope you know the nmon file is text and editable with vi (but you might hit the 2048 byte line limit on the AIX vi). I use the Open Source vim on AIX to avoid this or do it on Linux. If you take a look at the file format you should be able to cut done the file size and make a series of files but each will need the header section that you will find at the top of the file and then a different set of snapshots. Question 39: nmon does not work on my Linux machine!!

nmon runs on x86 (Intel and AMD), mainframe and POWER processors and on a dozen or so versions of Linux. If you report problems I will need to know which platform and which Linux version plus distro before I can help so please include these with initial questions. Question 40: When do we get nmon 10 for Linux?

The Linux & AIX source code for nmon is very different apart from curses framework and basic approach. AIX gets all the information from system and library calls and in Linux this has to be read from the /proc filesystem. This means the AIX code is more straight forward. So there is no need for Linux and AIX to have the same version number. From nmon version 11, the AIX and Linux user interfaces where made the same and release with the same version number to keep people happy. There was no nmon for Linux version 10. Question 41: The boxes and lines in nmon do not work right online with: DTterm, xterm, rvxt, putty, VNC, (whatever you have)?

nmon uses curses to handle the displaying of characters on the terminal. This is controlled mostly by your TERM variable setting. The nmon developer tests with all of the above. They work perfectly and they work perfectly all the time. If it does not work for you then you have some setting wrong on your machine or X Windows or have some strange settings for TERM and/or TERMINFO shell variable setting or you are using a duff terminal emulator.

Let me state that again: your system has a problem not nmon.

The TERM shell variable should be set to the terminal emulator you are using.

    If you are using a xterm then TERM should be xterm
    If you are using DTterm then TERM should be dtterm
    If you are using an AIX term then TERM should be set to aixterm
    Get the idea - other combinations are your problem.

Unless you are using a genuine 1970's DEC VT100 then you should not be using this setting with more advanced terminal emulators. I remember VT100's well, even found a bug in the firmware once!

The TERMINFO variable should not be set to anything (in fact not set at all). If it is then you or someone has been mucking about with terminfo databases and why are you blaming nmon?

Terminal Emulators:

    xterm works well in black and white.
    aixterm works well and has colour and nmon uses the colour.
    DTterm works well and has colour and nmon uses the colour.
    rxvt and xterm-color combination (see WWW for details on setup, on google.com search for xterm-color and AIX) - this combination also lets vim (the improved vi from Open Source) use syntax highlighting in C code.
    The Windows telnet terminals emulation is very poor indeed and not recommended under any circumstances - you are on your own.
    The best alternative on a Windows PC is putty (see WWW for details and download) and is highly recommended - I use this every day - this will work with TERM set to xterm perfectly.
    VNC is, of course, even better and gives you X windows on a Windows PB at zero cost - again highly recommended.

The -B option starts nmon with no boxes (or colour). Some purists do not like to waste the screen space with the box lines. You could add 'B' to the NMON shell variable to make this automatic: export NMON=B Question 42: I have 2400 disk (small SAN LUNs) and nmon is slow to collect the stats from so many, can you help?

I guess you are learning the folly of small LUNs and that it makes the totally machine unmanageable. But you are not the first or worst - the record stands at 4500. Some suggestions:

    Have you got more than four paths to each LUN?
        If yes, you need to fix this ASAP as it is bad for performance and terrible for RAS (and I mean really bad).
    Use the -D flag to stop nmon collecting disk configuration each time can really help the start up time.
    Collect this disk configuration just the once - unless you are changing the disks a lot!!
    You can use nmon User Defined Disk Groups to limit the output but nmon will still have to collect all the data from all the disks and then reduce what is actually reported.
    But the only real solution is to reduce the number of disks you have - yes, I know this is a lot of work but you have a machine setup that can not be managed and that is not viable in the long term.
    Don't blame nmon for highlighting the issue.

I recommend 32 to 64 LUNs and make the disk subsystem do the hard work of spreading the data across disk - i.e. not you. After all that is what you buy big disk subsystems for and there a better uses of your time and thought. Question 43: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?

Correct, this data is not available on AIX 5.1 from the libperfstat library. This also causes a problem on nmon2rrd version 10 where it expects the IOADAPT section and crashes. Recommended action upgrade AIX as 5.1 is not supported without purchasing extended support. Question 44: What is CharIO (a column of the TOP processes stats)?

to:
  • This question is asked a lot and it can mean your CPUs are actually too fast!
  • CPU "waiting for I/O" state and utilisation numbers (as opposed to User, System and Idle) means the CPU is Idle but has a disk I/O outstanding. In history this was used to highlight that your application is being held up by slow disks or disks problems. In the Wait for I/O state the CPU is actually free to do other work and the CPU is NOT looping waiting for the disk - it in fact actioned the adapter to perform the disk I/O, put the calling process to sleep and carried on. If there is no other process it is in the same loop as in the Idle state i.e. it is available to do other things. In AIX the processor does one of two things
    • in regular stand-alone machines or a dedicate CPU LPAR the process runs a special kernel level process called "wait" from which it can exit very quickly at the arrival of the next interrupt
    • In a micro-partition (Shared Processor LPAR) the processor after a few micro seconds will call the Hypervisor to yield the processor for other LPARs
  • In benchmarks, Wait for I/O is seen positively as an opportunity - we can do throw in more work to boost throughput.
  • Any workload in which the CPU does comparatively little work compared to the volume of disk I/O is going to give you high Wait for I/O.
  • If this high Wait for I/O is a sudden change from the normal pattern then it needs investigating and you should make sure as many disks as possible are involved in the disk I/O.
  • But lots of workloads just run like this - a common example I come across regularly is SAP databases. SAP cleverly caches lots of data but on large database it has to do lots of disk I/O for particular customer or whatever records. Once the data is available it is sent to the SAP application servers i.e. little work is done on the database.
  • In fact, faster CPUs would mean even high wait values.

Question 35: On AIX, free memory is near zero, how do I free more memory?

AIX

  • This is just how AIX works and is perfectly normal. All of memory will be soaked up with copies of filesystem blocks after a reasonable length of time and the free memory will be near zero. AIX will then use the lrud process to keep the free list at a reasonable level.
  • If you see the lrud process taking more than 30% of a CPU then you need to investigate and make memory parameter changes.

[+Question 36: '''How can I set numperm better?

  • You can't. This number just reflects the amount of memory being used for disk blocks - called the buffer cache. It is controlled by three parameters minperm, maxperm and strictperm but these set thresholds and algorithms. The actual numperm number reflects what is actually going on. You will have to find other places for tuning these parameters as it is beyond the scope of this FAQ.
  • It is also worth noting that the nmon values for numperm and maxperm are based on a percentage of physical memory. The AIX commands report a percentage but not of all memory - they seem to remove some memory that might be something like the memory allocated to the AIX kernel (i.e. it could never be used as cache). Unfortunately this is not documented and the memory size not counted is not available with any public API. So nmon does the best it can but the numbers will not be absolutely the same.

Question 37: What format is the nmon output file?

  • Plain ASCII text that you can edit and editable with vi (but you might hit the 2048 byte line limit on the AIX vi).
  • I use the Open Source vim on AIX to avoid this or do it on Linux.
  • The first token on the line tells you what sort of data it is
    • AAA lines are basic nmon data about this collection of data
    • BBB lines are about the configuration of the machine
    • ZZZ lines include the date and time stamp stored here once ro reduce output
    • others should be obvious
  • The second field is the Timestamp - see the ZZZ section to the actual time
  • Then there is the data
  • Each sort of data (CPU, DISK, etc.) has a Header line that describes the columns and the header lines also include the graph titles
  • You do not need to sort the nmon output file for nmon2rrd or the Analyser but it you do then you can see the sections easier for editing.

Question 38: I have collected once a second for 8 hours but I can't get the Analyser to work?

  • You have 28800 data points and you want to see this on a screen with say 1024 pixels wide !!
    • that is 29 data points per pixel.
  • My new Thinkpad has 1400 pixels across the screen, so I am down to just 18 data points per pixel
    • What where you thinking when you started collecting so much data!!
  • I think even with the best will in the world, the analyser spreadsheet is going to struggle at some point with too much data.
  • On a tiny machine you get about 1.5KB per snapshot and a normal size machine with a few nmon options it is more like 60KB each. At 60KB the maths --> 28800*60KB = 1.6GB. How big is your output file?
  • I hope you have at least 16 GBs of memory in your PC to handle this!
  • As I hope you know the nmon file is text and editable with vi (but you might hit the 2048 byte line limit on the AIX vi). I use the Open Source vim on AIX to avoid this or do it on Linux. If you take a look at the file format you should be able to cut done the file size and make a series of files but each will need the header section that you will find at the top of the file and then a different set of snapshots.

Question 39: nmon does not work on my Linux machine!

  • nmon runs on x86 (Intel and AMD), mainframe, ARM and POWER processors and on a dozen or so versions of Linux.
  • If your Linux system has a C compiler and ncurses you can have nmon running in a minute or two.
  • If you report problems I will need to know which platform and which Linux version plus distro before I can help so please include these with initial questions.

Question 40: When do we get nmon for AIX version X for Linux?

  • The Linux & AIX source code for nmon is very different apart from curses framework and basic approach.
  • AIX gets all the information from system and library calls (with two exceptions) and in Linux this has to be read from the /proc filesystem and some classic UNIX style kernel functions.
  • This means the AIX code is more straight forward.
  • The code base of the AIX and Linux version are completely different.
  • So there is no need for Linux and AIX to have the same version number.

Question 41: The boxes and lines in nmon do not work right online with: DTterm, xterm, rvxt, putty, VNC, (whatever you have)?

  • nmon uses curses to handle the displaying of characters on the terminal.
  • This is controlled mostly by your TERM variable setting.
  • The nmon developer tests with all of the above.
  • They work perfectly and they work perfectly all the time.
  • If it does not work for you then you have some setting wrong on your machine or X Windows or have some strange settings for TERM and/or TERMINFO shell variable setting or you are using a duff terminal emulator.
  • For example you can tell putty this is a xterm session via putty sessings but tell Linux this is a vt100 session with TERM=vt100 then expect odd things to happen.
  • Let me state that again: your system has a problem not nmon.
  • The TERM shell variable should be set to the terminal emulator you are using.
    • If you are using a xterm then TERM should be xterm
    • If you are using DTterm then TERM should be dtterm
    • If you are using an AIX term then TERM should be set to aixterm
    • Get the idea - other combinations are your problem.
    • Unless you are using a genuine 1970's DEC VT100 then you should not be using this setting with more advanced terminal emulators. I remember VT100's well, even found a bug in the firmware once!
    • The TERMINFO variable should not be set to anything (in fact not set at all). If it is then you or someone has been mucking about with terminfo databases and why are you blaming nmon?
  • Terminal Emulators:
    • xterm works well in black and white.
    • aixterm works well and has colour and nmon uses the colour.
    • DTterm works well and has colour and nmon uses the colour.
    • rxvt and xterm-color combination (see WWW for details on setup, on google.com search for xterm-color and AIX) - this combination also lets vim (the improved vi from Open Source) use syntax highlighting in C code.
    • The Windows telnet terminals emulation is very poor indeed and not recommended under any circumstances - you are on your own.
    • The best alternative on a Windows PC is putty (see WWW for details and download) and is highly recommended - I use this every day - this will work with TERM set to xterm perfectly.
    • VNC is, of course, even better and gives you X windows on a Windows workstation at zero cost - again highly recommended.
  • The -B option starts nmon with no boxes (or colour). Some purists do not like to waste the screen space with the box lines. You could add 'B' to the NMON shell variable to make this automatic: export NMON=B

Question 42: I have 2400 disks (or 2400 small SAN LUNs) and nmon is slow to collect the stats from so many, can you help?

  • I guess you are learning the folly of small LUNs and that it makes the totally machine unmanageable. But you are not the first or worst - the record stands at 4500. Some suggestions:
    • Have you got more than four paths to each LUN?
      • If yes, you need to fix this ASAP as it is bad for performance and terrible for RAS (and I mean really bad).
    • Use the -D flag to stop nmon collecting disk configuration each time can really helps to reduce the start up time.
    • Collect this disk configuration just the once - unless you are changing the disks a lot!!
    • You can use nmon User Defined Disk Groups to limit the output but nmon will still have to collect all the data from all the disks and then reduce what is actually reported.
    • But the only real solution is to reduce the number of disks you have - yes, I know this is a lot of work but you have a machine setup that can not be managed and that is not viable in the long term.
    • Don't blame nmon for highlighting the issue.
    • I recommend 32 to 64 LUNs and make the disk subsystem do the hard work of spreading the data across disk - i.e. not you as your time is much more valuable. After all that is what you buy big disk subsystems for and there a better uses of your time and thought.

Question 43: Old nmon for AIX: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?

AIX

  • Correct, this data is not available on AIX 5.1 from the libperfstat library.
  • This also causes a problem on nmon2rrd version 10 where it expects the IOADAPT section and crashes.
  • Recommended action upgrade AIX as 5.1 is not supported without purchasing extended support.

Question 44: What is CharIO (a column of the TOP processes stats)?

Added line 489:
November 22, 2016, at 01:21 AM by 127.0.0.1 -
Changed lines 216-244 from:

Question 16: Why don't you add a Java front end to nmon and get graphical output?

Answer: I don't have the time. If you can give me a frame work for getting C functions to generate graphs, please let me know. Question 17: The command line options don't seem to work right for file capture?

Answer: The -f, -F, -x, -X or -z MUST be the first option on the line and only one of them. This option sets all the other option flags. You can then use the other flags to modify their default behaviour. This has improved with the latest nmon versions. Question 18: What is paging to a filesystem?

Hopefully, you already understand paging to paging space (also called virtual memory). AIX (and other UNIX versions) page in the read-only code from a program as you start it and as it runs. This is just like paging in from the paging space but is directly from the filesystem, this is also true for shared libraries (which you might not be aware you are using). Also programs using memory mapped files access the files by simply reading and writing memory addresses - AIX will page in the file pages as necessary and they will get paged back to the filesystem to free up memory or if the program forces it. Question 19: Where can I get nmon and further information?

Answer: From this Wiki !! The data displayed by nmon are similar to the displays generated by the standard AIX commands such as vmstat, iostat, netpnmon, df, and sar. Use the AIX manual pages for these standard commands to understand what the displayed data means.

Following are several useful IBM Redbooks that you can buy or download for free from http://www.redbooks.ibm.com/portals/unix:

    Understanding IBM pSeries Performance and Sizing (new version SG24-4810-1) 400 pages.
    For Performance tuning on pSeries and AIX - Database Performance on AIX in the DB2 UDB and Oracle Environments (SG24-5511) 450 pages. The techies bible for tuning these databases for high performance.
    AIX 5L Performance Tools Handbook (SG24 6039) 950 pages - All the latest tools for AIX5L including truss and WLM.
    PowerVM Virtualization on IBM System p: Introduction and Configuration Fourth Edition - http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247940.html
    AIX 5L Practical Performance Tools and Tuning Guide - http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246478.html
    AIX Performance Management Guide - http://publib.boulder.ibm.com/infocenter/systems/scope/aix/index.jsp?topic=/com.ibm.aix.doc/doc/base/performance.htm&tocNode=toc:com.ibm.aix.doc/aix/7/

Question 20: nmon crashes after about 200 snapshots on AIX?

If you request Workload Manager stats and have WLM switches off then due to bugs in AIX and a huge memory leak in the libwlm library, nmon will grow in size every time it fails to access the WLM stats until it hits 256 MB and will then crash. This is fixed in nmon 11 by switching off WLM stats after a few failed attempts. Question 21: TOP process stats get switched on when I request Asynchronous I/O stats?

This is working as normal. To get the aioserver stats the details of all processes has to be collected, sorted and searched.

to:

Question 16: Why don't you add a Java front end to nmon and get graphical output?

  • I don't have the time or the interest.
  • I have had a great laugh at Linux tools that do this sort of thing but then they highlight that the graphing takes serious CPU cycles. I have seen very simple tools take from 20% to 100% of a CPU - which is not what nmon is all about. I don't want to waste server CPU time collecting the data when that CPU should be used for running the application, RDBMS or what-ever.
  • nmon aims to keep below a few percent of one CPU - this gets smaller as CPUs get faster.

Question 17: The command line options don't seem to work right for file capture?

  • The -f, -F, -x, -X or -z MUST be the first option on the line and only one of them.
  • This is documented in the nmon -h
  • This option sets all the other option flags to a sensible set.
  • You can then use the other flags to modify their default behaviour.

Question 18: What is paging to a filesystem (rather than to paging space)?

  • Hopefully, you already understand paging to paging space (also called virtual memory).
  • There are other types of paging.
  • AIX (and other UNIX versions) page in the read-only code from a program as you start it and as it runs. This is just like paging in from the paging space but is directly from the filesystem, this is also true for shared libraries (which you might not be aware you are using).
  • Also programs using memory mapped files access regular filesystem files - this allows access by simply reading and writing memory addresses - AIX will page in the file pages as necessary and they will get paged back to the filesystem to free up memory or if the program forces it or if the program stops.

Question 19: Where can I get nmon and further information?

  • The data displayed by nmon are similar to the displays generated by the standard AIX and Linux commands such as vmstat, iostat, netpnmon, df, and sar. Use the AIX and Linux manual pages for these standard commands to understand what the displayed data means.
  • Following are several useful IBM Redbooks that you can buy or download for free from http://www.redbooks.ibm.com/Redbooks.nsf/portals/Power?Open:
    1. Understanding IBM pSeries Performance and Sizing (new version SG24-4810-1) 400 pages.
    2. For Performance tuning on pSeries and AIX - Database Performance on AIX in the DB2 UDB and Oracle Environments (SG24-5511) 450 pages. The techies bible for tuning these databases for high performance.
    3. AIX 5L Performance Tools Handbook (SG24 6039) 950 pages - All the latest tools for AIX5L including truss and WLM.
    4. PowerVM Virtualization on IBM System p: Introduction and Configuration Fourth Edition - http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247940.html
    5. AIX 5L Practical Performance Tools and Tuning Guide - http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246478.html
    6. AIX Performance Management Guide - http://publib.boulder.ibm.com/infocenter/systems/scope/aix/index.jsp?topic=/com.ibm.aix.doc/doc/base/performance.htm&tocNode=toc:com.ibm.aix.doc/aix/7/
    7. XXX

Question 20: Very old question about nmon 10 and WPAR stats removed

  • Removed

Question 21: TOP process stats get switched on when I request AIX Asynchronous I/O stats?

  • This is working as normal. To get the AIX aioserver stats the details of all processes has to be collected, sorted and searched.
Changed lines 259-335 from:

Question 23: nmon2rrd fails, please fix it?

You have been supplied with the source code for nmon2rrd and it is supplied as a "toolbox". This means users are expected to come up with fixed rather than the original developer. Note there are updated versions from users on the nmon download site - well done guys. Question 24: NANQ and INF?

These are output when calculations within nmon have gone wrong. Typically, when dividing by zero. NANQ means "Not a number" and INF means infinite. Some times this can happen due to rounding errors but mostly it is a bug or that numbers a have overflowed the C data types. Question 25: nmon and AIX commands do not agree?

See question 26. A lot of this happens with nmon 10 and the Shared Processor Logical Partitions (SPLPAR) - what marketing calls Micro-partition. Some of it is because the AIX commands are very unclear about what they are reporting. What was CPU numbers can now be physical CPU, Logical CPU or Virtual CPU numbers and the documentation is unclear. So you may not be comparing "like with like". This has been improved in nmon 11 - please report further issues from nmon 11 onwards. Question 26: nmon reports more than 100% for a process - clearly it is wrong?

Unlike AIX commands, nmon reports the CPU use of a process per CPU. If your process is, for example, taking 250% then it is using 2.5 CPUs and must be multiple threaded. This is far better than the AIX tools because the percentages on larger machines make it very hard to determine if a process is using a whole CPU. On a 64 CPU machine a single rogue process uselessly spinning on the CPU takes up 1.56% of the total CPU - this makes it very unclear what is going on. Question 27: On AIX the disk adapter are wrong?

nmon just outputs what it gets from the libperfstat library. For multipath I/O it is often the disk to adapter mapping reflects the order of disk discovery rather than some balanced view. This is an AIX problem and not nmon's fault. To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 binaries from the Roll Your Own Wiki page at ryo - and the adapt sample program.

If you don't like the way libperfstat reports the adapter stats raise a PMR and refer to the adapt sample - as you will get no where reporting nmon errors. Question 28: on AIX the adapter busy goes over 100%. That is impossible surely?

There are no adapter stats in AIX (see above). They are derived from the disk stats. The adapter busy% is simply the sum of the disk busy%.

So if the adapter busy% is, for example, 350% then you have 3.5 disks busy on that adapter. Or it could be 7 disks at 50% busy or 14 disks at 25% or ....

There is no way to determine the adapter busy and in fact it is not clear what it would really mean. The adapter has a dedicated on-board CPU that is always busy (probably no real OS) and we don't run nmon of these adapter CPUs to find out what they are really doing!! Question 29: What about nmon for HP/UX, Solaris on Sparc or x86 or Linux on Itanium?

As I don't have access to such machines this is not going to happen. There is also a problem that IBM gives me access to the current hardware because nmon is seen as a competitive advantage. If this was ported to every UNIX then I would not be allowed this access. Question 30: What about nmon for Windows?

Now you must be joking. Question 31: Seeing double the number of CPUs?

This is due to the SMT feature of the POWER5 chip, where each CPU (core) runs two processes at the same time. this gives you a 40% boost in performance for most commercial workloads and it s really "good thing". You need to read up on SMT or get yourself a presentation from IBM on the subject. Question 32: 0509-036 Cannot load program /usr/lib/drivers/nfs_kdes.ext ?

You start nmon and get:

    nmon for AIX5 exec(): 0509-036 Cannot load program /usr/lib/drivers/nfs_kdes.ext

First lets make this very clear - this is an AIX "feature" and not due to user level code like nmon. The AIX loader is failing to load the NFS kernel extension. I looked up this error in the IBM problem database and found 10 hits of others reporting this issue with other tools (i.e. not just an nmon problem).

    PMR 76818, 000, 738 - AIX: after 64bit switch over, NFS error
    PMR 53438, 499, 000 - questions about starting rpc.mountd
    PMR 66239, 070, 724 - W4F 4 command showmount missing
    PMR 43814, 019, 000 - unable to mount - nfs_kdes.ext linked to wrong ext
    PMR 82641, L6Q, 000 - 0509-022 Cannot load module, NFS error

From the first and last PMRs above: The suggested fix is changing soft link

    /usr/lib/drivers/nfs_kdes.ext -> /usr/lib/drivers/nfs_kdes_full.ext
    to
    /usr/lib/drivers/nfs_kdes.ext -> /usr/lib/drivers/nfs_kdes_null.ext

"it seems that if you install some "DES" file of the expansion pack, it will relink your "nfs_kdes.ext" to "nfs_kdes_full.ext". This extension, however, does not load on 64bit (presumably 64bit AIX kernel). That's why you have to relink to fs_kdes_null.ext."...PMR 88582, 487, 000 DES fileset e.g. bos.crypto

Do the following:

    cd /usr/lib/drivers
    rm nfs_kdes.ext
    ln -s /usr/lib/drivers/nfs_kdes_null.ext nfs_kdes.ext

I strongly suggest you contact AIX support to confirm this is a sensible resolution to the issue, before continuing - just in case there are other side effects. Question 33: Hello, I am new to UNIX and want to tune AIX, what do you recommend?

Don't do it. AIX is very good at looking after itself and self tuning. I have seen rookie systems admin nearly halt a machine by making "improvements". Go on a course or read the AIX performance Redbooks from http://www.redboooks.ibm.com but don't just try changing things unless you first of all have a problem and second know what you are doing and have practiced on a non-production machine or LPAR. See the AIX Wiki What To Do After Installation hints at

    http://www-941.haw.ibm.com/collaboration/wiki/display/WikiPtype/Basic+Setup

Question 34: CPU wait is too high, how can I reduce it?

to:

Question 23: nmon2rrd fails, please fix it?

  • nmon2rrd is a C program that takes nmon files and changes the data ready for the excellent RRDTOOL which can be used to generate graphs in .gif files for displaying on a webserver.
  • You have been supplied with the source code for nmon2rrd and it is supplied as a "toolbox".
  • This means users are expected to come up with fixed rather than the original developer.
  • Note there are updated versions from users on the nmon download site - well done guys.

Question 24: what are NANQ and INF?

  • These are output when calculations within nmon have gone wrong.
  • Typically, when dividing by zero. NANQ means "Not a number" and INF means infinite.
  • Some times this can happen due to rounding errors but mostly it is a bug or that numbers a have overflowed the C data types.
  • when nmon uses printf to display the invalid number it outputs these strings instead.

Question 25: Old nmon version question: nmon and AIX commands do not agree?

  • A lot of this happens with nmon 10 and the Shared Processor Logical Partitions (SPLPAR) - what marketing calls Micro-partition.
  • Some of it is because the AIX commands are very unclear about what they are reporting.
  • What was CPU numbers can now be physical CPU, Logical CPU or Virtual CPU numbers and the documentation is unclear.
  • So you may not be comparing "like with like". This has been improved in nmon 11 - please report further issues from nmon 11 onwards.
  • also see question 26.

Question 26: nmon reports more than 100% for a process - clearly it is wrong?

  • Unlike AIX and some Linux commands, nmon reports the CPU utilisation of a process per CPU (the commands report as a percentage of all CPUs).
  • If your process is, for example, taking 250% then it is using 2.5 CPUs and must be multiple threaded as its more than one CPU.
  • This is far better than the commands because the percentages on larger machines make it very hard to determine if a process is using a whole CPU.
  • On a 64 CPU machine a single rogue process uselessly spinning on the CPU takes up 1.56% of the total CPU - this makes it very unclear what is going on.

Question 27: On AIX the disk adapters are wrong?

AIX

  • nmon just outputs what it gets from the libperfstat library.
  • For multipath I/O it is often the disk to adapter mapping reflects the order of disk discovery rather than some balanced view.
  • This is an AIX problem and not nmon's fault.
  • To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 binaries from the Roll Your Own Wiki page at ryo - and the adapt sample program: https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/Power%20Systems/page/Roll-Your-Own-Performance-Tool
  • If you don't like the way libperfstat reports the adapter stats raise a PMR and refer to the adapt sample - as you will get no where reporting nmon errors.

Question 28: On AIX the adapter busy goes over 100%. That is impossible surely?

  • There are no adapter stats in AIX (see above). They are derived from the disk stats. The adapter busy% is simply the sum of the disk busy%.
  • So if the adapter busy% is, for example, 350% then you have 3.5 disks busy on that adapter. Or it could be 7 disks at 50% busy or 14 disks at 25% or ....
  • There is no way to determine the adapter busy and in fact it is not clear what it would really mean. The adapter has a dedicated on-board CPU that is always busy (probably no real OS) and we don't run nmon of these adapter CPUs to find out what they are really doing!!

Question 29: What about nmon for HP/UX, Solaris on Sparc or x86 or Linux on Itanium?

  • As I don't have access to such machines this is not going to happen.
  • There is also a problem that IBM gives me access to the current hardware because nmon is seen as a competitive advantage. If this was ported to every UNIX then I would not be allowed this access.
  • There have been attempts to port nmon for Linux to other operating system but they have not been continued after a year or so.

Question 30: What about nmon for Windows?

  • Now you must be joking.
  • this does get asked a couple of times a year.* The real problems are
    1. How would the stats be extracted from Windows by a C program given no llibperfstat or /proc?
    2. The stats would be completely different and for AIX/UNIX/Linux performance people very hard/impossible to understand
    3. Given 2 none of the graphing tools would work.

Question 31: Seeing double the number of CPUs on my POWER server?

  • This is a POWER based machine question
  • This is due to the SMT feature of the POWER5 chip (and later POWER chips), where each CPU (core) runs two processes at the same time.
  • This gives you a 40% boost in performance for most commercial workloads and it s really "good thing".
  • You need to read up on SMT or get yourself a presentation from IBM on the subject.

Question 32: Very old nmon version for AIX: question about NFS driver failures removed

  • Removed

Question 33: Hello, I am new to UNIX and want to tune AIX, what do you recommend?

  • Don't do it.
  • AIX is very good at looking after itself and self tuning. I have seen rookie systems admin nearly halt a machine by making "improvements".
  • Go on a course or read the AIX performance Redbooks from http://www.redboooks.ibm.com but don't just try changing things unless you first of all have a problem and second know what you are doing and have practiced on a non-production machine or LPAR.

Question 34: CPU wait is too high, how can I reduce it?

Added lines 368-369:
November 22, 2016, at 12:43 AM by 127.0.0.1 -
Changed lines 3-4 from:
 Frequently Asked Questions
to:

This is a work in progress - November 2016


Changed lines 144-149 from:
  • nmon crashes have it starts in collecting to a file mode.
  • nmon Analyser does not work - because the nmon file is empty or incomplete
  • Can we have a new feature XYZ - and its all ready implemented so read the nmon -h output
  • How do I interpret nmon output - first do your how work by learn UNIX and Linux performance statistics: read the command manuals, take a course spend 5 years benchmarking
to:
  1. nmon crashes have it starts in collecting to a file mode.
  2. nmon Analyser does not work - because the nmon file is empty or incomplete
  3. Can we have a new feature XYZ - and its all ready implemented so read the nmon -h output
  4. I have a problem with the nmon options - turns out they can't read nmon -h which stats -f or -F MUST be the first option on the line
  5. How do I interpret nmon output - first do your how work by learn UNIX and Linux performance statistics: read the command manuals, take a course spend 5 years benchmarking
Changed lines 165-183 from:

Question 11: Can I decide the filename it saves data too?

Answer: Use nmon -h and check out the -F <file> option Question 12: What is the default output filename?

<hostname><Year><Month><Day><Hours><Minute>.nmon Notes:

    This has been very carefully chosen so that a directory of nmon files will sort in machine and then time order. So you can find the data file you want in a simple way.
    Many people needlessly make up their own names via scripts and date commands - a pointless waste of time.
    One side effect is that, if two nmon captures are started in the same minute they might use the same filename, so stagger the start up by 61 seconds.

Question 13: I want nmon output piped into a further command, how?

Answer: Use a FIFO and the -F option.

to:

Question 11: Can I decide the filename nmon saves data too?

  • Use nmon -h and check out the -F <file> option which must be the first option on the line

Question 12: What is the default output filename?

  • <hostname>_<Year><Month><Day><Hours><Minute>.nmon
  • Notes:
    • This has been very carefully chosen after years of experience
    • A directory of nmon files will sort in machine and then date+time order. So you can find the data file you want in a simple way.
    • Many people needlessly make up their own names via scripts and date commands that will not be in any sensible order = a pointless waste of time.
    • One side effect is that, if two nmon captures are started in the same minute they might use the same filename, so stagger the start up by 61 seconds.

Question 13: I want nmon output piped into a further command, how?

  • Use a FIFO and the -F option.

@@

Changed lines 187-198 from:

If you are doing this with the online data output, I think you are barking mad but some people are still trying it. Question 14: Why do you support all these old unsupported AIX versions?

Answer: You would be amazed at what versions are running out there. I guess it is a case of - "if it isn't broken don't touch it". nmon can also help when planned server consolidation from these old version to, for example, micro-partitions on newer hardware. Question 15: What if I want support?

Answer: You have a few options:

    given me money (and I have no problem with this) or
    pay for and use Performance Toolbox/6000 which can do most of nmon and lots more too.
    I have agreement in principle that nmon support can be added to an existing AIX Support contract for an extra fee. So far no one as far as I know has signed up for this. If interested get in touch with your AIX Support channel and ask them to get in contact with Nigel.
to:

@@

  • If you are doing this with the online data output, I think you are "barking mad" but some people are still trying it.

Question 14: Why do you support all these old unsupported AIX versions?

  • You would be amazed at what AIX versions are running out there!
  • I guess it is a case of - "if it isn't broken don't touch it".
  • nmon can also help when planned server consolidation from these old version to, for example, micro-partitions on newer hardware.

Question 15: What if I want support?

  • You have a few options:
    • Given me money (and I have no problem with this) or
    • Pay for and use IBM Tivoli Performance Monitoring product with support
    • Pay for and use PM for AIX a remote service where you servers performance data is sent and it generates all the graphs that you can view online.

AIX

  • nmon for AIX is a fully supported AIX command so you can raise a IBM Problem report (PMR). However, you can't really ask for help with post-processing graphing tools that are not part of AIX.

Linux

  • nmon for Linux is becoming part of the popular distribution - if you have paid for support you could request help
  • You can raise bugs on the sourceforge.net website for the nmon project: https://sourceforge.net/projects/nmon/

If it is something fairly simple you could ask a question on the IBM Performance tools Forum: https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000749

November 21, 2016, at 09:14 AM by 127.0.0.1 -
Added line 9:
  • Answers in %redRED are for very old versions of nmon for AIX
Changed lines 79-80 from:

Question 1: Which nmon for my version of AIX or Linux?

to:

Question 1: Which nmon for my version of AIX or Linux?

Changed lines 91-92 from:

Question 2: nmon crash shortly after starting a data capture please send me the next version?

to:

Question 2: nmon crashes shortly after starting a data capture, please fix this send me the next version?

Added line 97:
Changed lines 107-137 from:

Answer: First check it is executable (this gets switched off by FTP). Second, if you are the root user, you have to name the executable directly with the full path name or (if in the current working directory) ./nmon or put it into a directory in your $PATH. Many people on AIX use /usr/local/bin and make sure the root user includes this in their $PATH. Question 5: Can you add the monitoring tape drive on AIX?

Answer: No - the data is not available. The best you can do is to watch the disks and guess what the tape is doing. The adapter statistics is only adding up the attached disks - so it does not help. You can guess at the tape drive I/O rates by looking at the disk I/O rates - after all this is where the data is coming from but it is only approximate and does not account for memory caching of data.

I have a little campaign running to get this tape stats feature available in AIX. Please, complain to your AIX support by raising a PMR - only by popular demand can this get high enough priority for the AIX developers to add this feature that we have been requesting for years! If you really want to "wind them up" say that you think Solaris now has tape stats.

One word of caution - if you are using a tape management system that does serverless backup - i.e. the data is transfered directly from client machines to the tape drives over fibre channel then the tape management system's AIX operating system never actually touches the data - so this can never be recorded by nmon.

There may be tape system supplied tools or APIs for getting tape drive stats. If you come across these please let Nigel know. We could use these to generate nmon style data that can be merged into the nmon data for analysis using the nmon external data collector features.

The same is true for Linux - unless you know the /proc file to find tape stats. In which case let Nigel know ASAP. Question 6: Can I get the adapters stats from other tools?

Answer: Not in AIX 4 - there are no adapter stats in AIX. This is now available in AIX 5 via the libperfstat library so programmers can get this information - but a warning this is derived data from the connected disks (NOT tape drives) because there is no adapter stats. Question 7: When I start nmon 9 on a system that it use to run fine I know get an error message?

The error is something about "lslpp" AIX 5.1 about ML03 onwards - or - WLM stats go missing - after upgrading to AIX 5.2 ML5 - can you fix nmon? Answer: These are bugs in AIX and not nmon -there are fixes available. Please report these problems to your AIX support channel and not me. nmon 10 has also been back ported to AIX 5.1 and AIX 5.2 and has code to work around these bugs and can be used instead of nmon9a. Question 8: What is the most reported error for nmon?

Answer: See previous question - these AIX bugs cover 70% of nmon complaints. Question 9: Can you add the monitoring of process priority?

Answer: This is only available from the AIX 5.1 onwards Question 10: on AIX, nmon 9 does not run, please fix?

With reports like: read error: No such device or address nmon file=nmon.c line=1278 version=XXX Answer: In 95% of the time it is because AIX was upgraded or a maintenance level added but the AIX/system was not rebooted. It is very easy to miss the "You must reboot" message in the gallons of installp output. The reboot is required because the AIX kernel image has been updated and the reboot is the only way to activate the new /unix file. nmon reads the /unix file to find kernel data structure addresses but if the /unix file does no match what is actually running, you get this message.

to:

Linux

  • First check it is executable (this gets switched off by FTP).
  • Second, if you are the root user, you have to name the executable directly with the full path name or (if in the current working directory) ./nmon or put it into a directory in your $PATH.

AIX

  • nmon since AIX 5.3 TL09+ and AIX 6.1 TL02+ and AIX 7 any version is a default install and the starting shell script can be found in /usr/bin/nmon - it actually starts the executable called nmon_topas.

Question 5: Can you add the monitoring tape drive on AIX?

AIX

  • No - the data is not available. The best you can do is to watch the disks and guess what the tape is doing. The adapter statistics is only adding up the attached disks - so it does not help. You can guess at the tape drive I/O rates by looking at the disk I/O rates - after all this is where the data is coming from but it is only approximate and does not account for memory caching of data.
  • Yes - if your tape drive is Fibre Channel connected it is very common to have it connected on a different FC adaapter to allow performance settings to suit the tape drive = streams of large blocks.
    • In this case, use the Adapter stats using the ^ key or -^ startup option to monitor the tape(s).

Linux No FC Adapter options for Linux - unless you know the /proc file to find tape stats. In which case let Nigel know ASAP.

Question 6: Can I get the adapters stats from other tools?

AIX

  • Not in AIX 4 - there are no adapter stats in this AIX.
  • This is now available in AIX 5 and higher via the libperfstat library so programmers can get this information - but a warning this is derived data from the connected disks (NOT tape drives) because there is no adapter stats. XXX

Question 7: When I start nmon 9 on a system that it use to run fine I know get an error message?

  • The error is something about "lslpp" AIX 5.1 about ML03 onwards - or - WLM stats go missing - after upgrading to AIX 5.2 ML5 - can you fix nmon?
  • These are bugs in AIX and not nmon -there are fixes available.
  • Please report these problems to your AIX support channel and not me. nmon 10 has also been back ported to AIX 5.1 and AIX 5.2 and has code to work around these bugs and can be used instead of nmon9a.

Question 8: What is the most reported error for nmon?

  • nmon crashes have it starts in collecting to a file mode.
  • nmon Analyser does not work - because the nmon file is empty or incomplete
  • Can we have a new feature XYZ - and its all ready implemented so read the nmon -h output
  • How do I interpret nmon output - first do your how work by learn UNIX and Linux performance statistics: read the command manuals, take a course spend 5 years benchmarking

Question 9: Can you add the monitoring of process priority?

  • Available from the AIX 5.1 onwards

Question 10: nmon on AIX, nmon 9 does not run, please fix?

  • With reports like:
  • read error: No such device or address
  • nmon file=nmon.c line=1278 version=XXX
  • In 95% of the time it is because AIX was upgraded or a maintenance level added but the AIX/system was not rebooted. It is very easy to miss the "You must reboot" message in the gallons of installp output. The reboot is required because the AIX kernel image has been updated and the reboot is the only way to activate the new /unix file. nmon reads the /unix file to find kernel data structure addresses but if the /unix file does no match what is actually running, you get this message.
Added lines 160-164:
November 21, 2016, at 08:41 AM by 127.0.0.1 -
Added lines 7-12:

Colour key:

  • Answers in GREEN are related to nmon for AIX
  • Answers in $blueBLUE are for nmon for Linux
  • Answer in BLACK apply to both versions
Changed lines 80-87 from:

Answer:

    On AIX with these or later versions: AIX 5.3 TL09+ and AIX 6.1 TL02+ and AIX 7 any version You should run the nmon that comes with AIX and is installed by default.
    It is strongly recommended if you have problems to first add all available service packs for your AIX release as this removes 99% of problems.
    If you have earlier AIX versions then you can run nmon classic downloadable from
    On Linux go to the nmon for Linux website (http://nmon.sourceforge.net) todownload nmon. It is compiled for 50 different platfomr (POWER, x86, x86_64 and Mainframe ) and distrubution combinations. If yours is not on the list or you have a newer Linux version you can now compile it up yourself.
to:

  • On AIX with these or later versions: AIX 5.3 TL09+ and AIX 6.1 TL02+ and AIX 7 any version You should run the nmon that comes with AIX and is installed by default.
  • It is strongly recommended if you have problems to first add all available service packs for your AIX release as this removes 99% of problems.
  • If you have earlier AIX versions then you can run nmon classic downloadable from XXX

  • On Linux go to the nmon for Linux website (http://nmon.sourceforge.net) to download nmon. It is compiled for 50 different platforms (POWER, x86, x86_64 and Mainframe ) and Linux distributions combinations.

If your combination is not on the list or you have a newer Linux version you can now compile it up yourself.

Changed lines 91-93 from:

Answer: When you are capturing data to a file, the nmon tool disconnects from the shell, to ensure that it continues running even if you log out. This means that nmon can appear to crash but it is still running in the background. Use: ps ef | grep nmon to see the process still running. Question 3: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?

to:
  • When you are capturing data to a file, the nmon tool disconnects from the shell, to ensure that it continues running even if you log out.
  • This means that nmon can appear to crash but it is still running in the background.
  • Use: ps -ef | grep nmon to see the nmon process still running.

Question 3: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?

November 21, 2016, at 08:34 AM by 127.0.0.1 -
Added lines 70-71:

Changed line 82 from:

{+Question 2: nmon crash shortly after starting a data capture please send me the next version?+]

to:

Question 2: nmon crash shortly after starting a data capture please send me the next version?

November 21, 2016, at 08:33 AM by 127.0.0.1 -
Added lines 1-590:

nmon for Linux and AIX Frequently Asked Questions (FAQ)

 Frequently Asked Questions

The postings on this site solely reflect the personal views of the authors and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.

Summary of the questions:

  • Question 1: Which nmon for my version of AIX or Linux?
  • Question 2: nmon crash shortly after starting a data capture please send me the next version?
  • Question 3: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?
  • Question 4: All I get is "nmon not found"?
  • Question 5: Can you add the monitoring tape drive on AIX?
  • Question 6: Can I get the adapters stats from other tools?
  • Question 7: When I start nmon 9 on a system that it use to run fine I know get an error message?
  • Question 8: What is the most reported error for nmon?
  • Question 9: Can you add the monitoring of process priority?
  • Question 10: on AIX, nmon 9 does not run, please fix?
  • Question 11: Can I decide the filename it saves data too?
  • Question 12: What is the default output filename?
  • Question 13: I want nmon output piped into a further command, how?
  • Question 14: Why do you support all these old unsupported AIX versions?
  • Question 15: What if I want support?
  • Question 16: Why don't you add a Java front end to nmon and get graphical output?
  • Question 17: The command line options don't seem to work right for file capture?
  • Question 18: What is paging to a filesystem?
  • Question 19: Where can I get nmon and further information?
  • Question 20: nmon crashes after about 200 snapshots on AIX?
  • Question 21: TOP process stats get switched on when I request Asynchronous I/O stats?
  • Question 23: nmon2rrd fails, please fix it?
  • Question 24: NANQ and INF?
  • Question 25: nmon and AIX commands do not agree?
  • Question 26: nmon reports more than 100% for a process - clearly it is wrong?
  • Question 27: On AIX the disk adapter are wrong?
  • Question 28: on AIX the adapter busy goes over 100%. That is impossible surely?
  • Question 29: What about nmon for HP/UX, Solaris on Sparc or x86 or Linux on Itanium?
  • Question 30: What about nmon for Windows?
  • Question 31: Seeing double the number of CPUs?
  • Question 32: 0509-036 Cannot load program /usr/lib/drivers/nfs_kdes.ext ?
  • Question 33: Hello, I am new to UNIX and want to tune AIX, what do you recommend?
  • Question 34: CPU wait is too high, how can I reduce it?
  • Question 35: On AIX, free memory is near zero, how do I free more memory?
  • Question 36: How can I set numperm better?
  • Question 37: What format is the nmon output file?
  • Question 38: I have collected once a second for 8 hours but I can't get the Analyser to work?
  • Question 39: nmon does not work on my Linux machine!!
  • Question 40: When do we get nmon 10 for Linux?
  • Question 41: The boxes and lines in nmon do not work right online with: DTterm, xterm, rvxt, putty, VNC, (whatever you have)?
  • Question 42: I have 2400 disk (small SAN LUNs) and nmon is slow to collect the stats from so many, can you help?
  • Question 43: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?
  • Question 44: What is CharIO (a column of the TOP processes stats)?
  • Question 45: On Linux the disk stats are all doubled?
  • Question 46: On AIX the disk seem to be mostly on the first adapter?
  • Question 47: On nmon for Linux the CPU Wait for IO number is zero or odd?
  • Question 48: On nmon for Linux the paging details are missing and the PAGE lines for the capture to file are missing.
  • Question 49: I want to collect data every second and then see weekly and monthly reports. How?
  • Question 50: nmon will not start on AIX 5.1 due to a libperfstat error?
  • Question 51: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?
  • Question 52: The Disk Busy stats are missing on AIX
  • Question 53: Sort order problems with massive nmon output files.
  • Question 54: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"
  • Question 54: AIX 5.3 updated but then nmon gives "Assert Failure"
  • Question 55: On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats
  • Question 56: Does nmon capture point in time stats or averages?
  • Question 57: Why is the Process memory percentage zero? (same for System and User percent)
  • Question 100: When will nmon collect data from lots of machines or LPARs?
  • Question 101: When will nmon collect data like "topas -C"?

Question 1: Which nmon for my version of AIX or Linux?

Answer:

    On AIX with these or later versions: AIX 5.3 TL09+ and AIX 6.1 TL02+ and AIX 7 any version You should run the nmon that comes with AIX and is installed by default.
    It is strongly recommended if you have problems to first add all available service packs for your AIX release as this removes 99% of problems.
    If you have earlier AIX versions then you can run nmon classic downloadable from
    On Linux go to the nmon for Linux website (http://nmon.sourceforge.net) todownload nmon. It is compiled for 50 different platfomr (POWER, x86, x86_64 and Mainframe ) and distrubution combinations. If yours is not on the list or you have a newer Linux version you can now compile it up yourself.

{+Question 2: nmon crash shortly after starting a data capture please send me the next version?+]

Answer: When you are capturing data to a file, the nmon tool disconnects from the shell, to ensure that it continues running even if you log out. This means that nmon can appear to crash but it is still running in the background. Use: ps ef | grep nmon to see the process still running. Question 3: I have a problem with nmon running on AIX 4.0.3 (or any really old AIX versions)?

Answer: Hard luck I will actively help get AIX 5 bugs fixed but older versions are very much less interesting. In particular, on AIX 4.1.5 the TOP processes does not work but I am not going to fix it unless some one offers me a bribe in hard currency Question 4: All I get is "nmon not found"?

Answer: First check it is executable (this gets switched off by FTP). Second, if you are the root user, you have to name the executable directly with the full path name or (if in the current working directory) ./nmon or put it into a directory in your $PATH. Many people on AIX use /usr/local/bin and make sure the root user includes this in their $PATH. Question 5: Can you add the monitoring tape drive on AIX?

Answer: No - the data is not available. The best you can do is to watch the disks and guess what the tape is doing. The adapter statistics is only adding up the attached disks - so it does not help. You can guess at the tape drive I/O rates by looking at the disk I/O rates - after all this is where the data is coming from but it is only approximate and does not account for memory caching of data.

I have a little campaign running to get this tape stats feature available in AIX. Please, complain to your AIX support by raising a PMR - only by popular demand can this get high enough priority for the AIX developers to add this feature that we have been requesting for years! If you really want to "wind them up" say that you think Solaris now has tape stats.

One word of caution - if you are using a tape management system that does serverless backup - i.e. the data is transfered directly from client machines to the tape drives over fibre channel then the tape management system's AIX operating system never actually touches the data - so this can never be recorded by nmon.

There may be tape system supplied tools or APIs for getting tape drive stats. If you come across these please let Nigel know. We could use these to generate nmon style data that can be merged into the nmon data for analysis using the nmon external data collector features.

The same is true for Linux - unless you know the /proc file to find tape stats. In which case let Nigel know ASAP. Question 6: Can I get the adapters stats from other tools?

Answer: Not in AIX 4 - there are no adapter stats in AIX. This is now available in AIX 5 via the libperfstat library so programmers can get this information - but a warning this is derived data from the connected disks (NOT tape drives) because there is no adapter stats. Question 7: When I start nmon 9 on a system that it use to run fine I know get an error message?

The error is something about "lslpp" AIX 5.1 about ML03 onwards - or - WLM stats go missing - after upgrading to AIX 5.2 ML5 - can you fix nmon? Answer: These are bugs in AIX and not nmon -there are fixes available. Please report these problems to your AIX support channel and not me. nmon 10 has also been back ported to AIX 5.1 and AIX 5.2 and has code to work around these bugs and can be used instead of nmon9a. Question 8: What is the most reported error for nmon?

Answer: See previous question - these AIX bugs cover 70% of nmon complaints. Question 9: Can you add the monitoring of process priority?

Answer: This is only available from the AIX 5.1 onwards Question 10: on AIX, nmon 9 does not run, please fix?

With reports like: read error: No such device or address nmon file=nmon.c line=1278 version=XXX Answer: In 95% of the time it is because AIX was upgraded or a maintenance level added but the AIX/system was not rebooted. It is very easy to miss the "You must reboot" message in the gallons of installp output. The reboot is required because the AIX kernel image has been updated and the reboot is the only way to activate the new /unix file. nmon reads the /unix file to find kernel data structure addresses but if the /unix file does no match what is actually running, you get this message. You can also get really weird effects, if you have messed up LIBPATH. Question 11: Can I decide the filename it saves data too?

Answer: Use nmon -h and check out the -F <file> option Question 12: What is the default output filename?

<hostname><Year><Month><Day><Hours><Minute>.nmon Notes:

    This has been very carefully chosen so that a directory of nmon files will sort in machine and then time order. So you can find the data file you want in a simple way.
    Many people needlessly make up their own names via scripts and date commands - a pointless waste of time.
    One side effect is that, if two nmon captures are started in the same minute they might use the same filename, so stagger the start up by 61 seconds.

Question 13: I want nmon output piped into a further command, how?

Answer: Use a FIFO and the -F option.

    mkfifo /tmp/xyz
    nmon -F /tmp/xyz s 5 c 300
    your-command </tmp/xyz

If you are doing this with the online data output, I think you are barking mad but some people are still trying it. Question 14: Why do you support all these old unsupported AIX versions?

Answer: You would be amazed at what versions are running out there. I guess it is a case of - "if it isn't broken don't touch it". nmon can also help when planned server consolidation from these old version to, for example, micro-partitions on newer hardware. Question 15: What if I want support?

Answer: You have a few options:

    given me money (and I have no problem with this) or
    pay for and use Performance Toolbox/6000 which can do most of nmon and lots more too.
    I have agreement in principle that nmon support can be added to an existing AIX Support contract for an extra fee. So far no one as far as I know has signed up for this. If interested get in touch with your AIX Support channel and ask them to get in contact with Nigel.

Question 16: Why don't you add a Java front end to nmon and get graphical output?

Answer: I don't have the time. If you can give me a frame work for getting C functions to generate graphs, please let me know. Question 17: The command line options don't seem to work right for file capture?

Answer: The -f, -F, -x, -X or -z MUST be the first option on the line and only one of them. This option sets all the other option flags. You can then use the other flags to modify their default behaviour. This has improved with the latest nmon versions. Question 18: What is paging to a filesystem?

Hopefully, you already understand paging to paging space (also called virtual memory). AIX (and other UNIX versions) page in the read-only code from a program as you start it and as it runs. This is just like paging in from the paging space but is directly from the filesystem, this is also true for shared libraries (which you might not be aware you are using). Also programs using memory mapped files access the files by simply reading and writing memory addresses - AIX will page in the file pages as necessary and they will get paged back to the filesystem to free up memory or if the program forces it. Question 19: Where can I get nmon and further information?

Answer: From this Wiki !! The data displayed by nmon are similar to the displays generated by the standard AIX commands such as vmstat, iostat, netpnmon, df, and sar. Use the AIX manual pages for these standard commands to understand what the displayed data means.

Following are several useful IBM Redbooks that you can buy or download for free from http://www.redbooks.ibm.com/portals/unix:

    Understanding IBM pSeries Performance and Sizing (new version SG24-4810-1) 400 pages.
    For Performance tuning on pSeries and AIX - Database Performance on AIX in the DB2 UDB and Oracle Environments (SG24-5511) 450 pages. The techies bible for tuning these databases for high performance.
    AIX 5L Performance Tools Handbook (SG24 6039) 950 pages - All the latest tools for AIX5L including truss and WLM.
    PowerVM Virtualization on IBM System p: Introduction and Configuration Fourth Edition - http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247940.html
    AIX 5L Practical Performance Tools and Tuning Guide - http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246478.html
    AIX Performance Management Guide - http://publib.boulder.ibm.com/infocenter/systems/scope/aix/index.jsp?topic=/com.ibm.aix.doc/doc/base/performance.htm&tocNode=toc:com.ibm.aix.doc/aix/7/

Question 20: nmon crashes after about 200 snapshots on AIX?

If you request Workload Manager stats and have WLM switches off then due to bugs in AIX and a huge memory leak in the libwlm library, nmon will grow in size every time it fails to access the WLM stats until it hits 256 MB and will then crash. This is fixed in nmon 11 by switching off WLM stats after a few failed attempts. Question 21: TOP process stats get switched on when I request Asynchronous I/O stats?

This is working as normal. To get the aioserver stats the details of all processes has to be collected, sorted and searched. Having paid the CPU cycles for the TOP process stats you may as well see them on the screen or in the output file, so nmon automatically switches them on for you at no addition charge. Question 23: nmon2rrd fails, please fix it?

You have been supplied with the source code for nmon2rrd and it is supplied as a "toolbox". This means users are expected to come up with fixed rather than the original developer. Note there are updated versions from users on the nmon download site - well done guys. Question 24: NANQ and INF?

These are output when calculations within nmon have gone wrong. Typically, when dividing by zero. NANQ means "Not a number" and INF means infinite. Some times this can happen due to rounding errors but mostly it is a bug or that numbers a have overflowed the C data types. Question 25: nmon and AIX commands do not agree?

See question 26. A lot of this happens with nmon 10 and the Shared Processor Logical Partitions (SPLPAR) - what marketing calls Micro-partition. Some of it is because the AIX commands are very unclear about what they are reporting. What was CPU numbers can now be physical CPU, Logical CPU or Virtual CPU numbers and the documentation is unclear. So you may not be comparing "like with like". This has been improved in nmon 11 - please report further issues from nmon 11 onwards. Question 26: nmon reports more than 100% for a process - clearly it is wrong?

Unlike AIX commands, nmon reports the CPU use of a process per CPU. If your process is, for example, taking 250% then it is using 2.5 CPUs and must be multiple threaded. This is far better than the AIX tools because the percentages on larger machines make it very hard to determine if a process is using a whole CPU. On a 64 CPU machine a single rogue process uselessly spinning on the CPU takes up 1.56% of the total CPU - this makes it very unclear what is going on. Question 27: On AIX the disk adapter are wrong?

nmon just outputs what it gets from the libperfstat library. For multipath I/O it is often the disk to adapter mapping reflects the order of disk discovery rather than some balanced view. This is an AIX problem and not nmon's fault. To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 binaries from the Roll Your Own Wiki page at ryo - and the adapt sample program.

If you don't like the way libperfstat reports the adapter stats raise a PMR and refer to the adapt sample - as you will get no where reporting nmon errors. Question 28: on AIX the adapter busy goes over 100%. That is impossible surely?

There are no adapter stats in AIX (see above). They are derived from the disk stats. The adapter busy% is simply the sum of the disk busy%.

So if the adapter busy% is, for example, 350% then you have 3.5 disks busy on that adapter. Or it could be 7 disks at 50% busy or 14 disks at 25% or ....

There is no way to determine the adapter busy and in fact it is not clear what it would really mean. The adapter has a dedicated on-board CPU that is always busy (probably no real OS) and we don't run nmon of these adapter CPUs to find out what they are really doing!! Question 29: What about nmon for HP/UX, Solaris on Sparc or x86 or Linux on Itanium?

As I don't have access to such machines this is not going to happen. There is also a problem that IBM gives me access to the current hardware because nmon is seen as a competitive advantage. If this was ported to every UNIX then I would not be allowed this access. Question 30: What about nmon for Windows?

Now you must be joking. Question 31: Seeing double the number of CPUs?

This is due to the SMT feature of the POWER5 chip, where each CPU (core) runs two processes at the same time. this gives you a 40% boost in performance for most commercial workloads and it s really "good thing". You need to read up on SMT or get yourself a presentation from IBM on the subject. Question 32: 0509-036 Cannot load program /usr/lib/drivers/nfs_kdes.ext ?

You start nmon and get:

    nmon for AIX5 exec(): 0509-036 Cannot load program /usr/lib/drivers/nfs_kdes.ext

First lets make this very clear - this is an AIX "feature" and not due to user level code like nmon. The AIX loader is failing to load the NFS kernel extension. I looked up this error in the IBM problem database and found 10 hits of others reporting this issue with other tools (i.e. not just an nmon problem).

    PMR 76818, 000, 738 - AIX: after 64bit switch over, NFS error
    PMR 53438, 499, 000 - questions about starting rpc.mountd
    PMR 66239, 070, 724 - W4F 4 command showmount missing
    PMR 43814, 019, 000 - unable to mount - nfs_kdes.ext linked to wrong ext
    PMR 82641, L6Q, 000 - 0509-022 Cannot load module, NFS error

From the first and last PMRs above: The suggested fix is changing soft link

    /usr/lib/drivers/nfs_kdes.ext -> /usr/lib/drivers/nfs_kdes_full.ext
    to
    /usr/lib/drivers/nfs_kdes.ext -> /usr/lib/drivers/nfs_kdes_null.ext

"it seems that if you install some "DES" file of the expansion pack, it will relink your "nfs_kdes.ext" to "nfs_kdes_full.ext". This extension, however, does not load on 64bit (presumably 64bit AIX kernel). That's why you have to relink to fs_kdes_null.ext."...PMR 88582, 487, 000 DES fileset e.g. bos.crypto

Do the following:

    cd /usr/lib/drivers
    rm nfs_kdes.ext
    ln -s /usr/lib/drivers/nfs_kdes_null.ext nfs_kdes.ext

I strongly suggest you contact AIX support to confirm this is a sensible resolution to the issue, before continuing - just in case there are other side effects. Question 33: Hello, I am new to UNIX and want to tune AIX, what do you recommend?

Don't do it. AIX is very good at looking after itself and self tuning. I have seen rookie systems admin nearly halt a machine by making "improvements". Go on a course or read the AIX performance Redbooks from http://www.redboooks.ibm.com but don't just try changing things unless you first of all have a problem and second know what you are doing and have practiced on a non-production machine or LPAR. See the AIX Wiki What To Do After Installation hints at

    http://www-941.haw.ibm.com/collaboration/wiki/display/WikiPtype/Basic+Setup

Question 34: CPU wait is too high, how can I reduce it?

This question is asked a lot and it can mean your CPUs are actually too fast!

CPU "waiting for I/O" state and utilisation numbers (as opposed to User, System and Idle) means the CPU is Idle but has a disk I/O outstanding. In history this was used to highlight that your application is being held up by slow disks or disks problems. In the Wait for I/O state the CPU is actually free to do other work and the CPU is NOT looping waiting for the disk - it in fact actioned the adapter to perform the disk I/O, put the calling process to sleep and carried on. If there is no other process it is in the same loop as in the Idle state i.e. it is available to do other things. In AIX the processor does one of two things

    in regular stand-alone machines or a dedicate CPU LPAR the process runs a special kernel level process called "wait" from which it can exit very quickly at the arrival of the next interrupt
    In a micro-partition (Shared Processor LPAR) the processor after a few micro seconds will call the Hypervisor to yield the processor for other LPARs

In benchmarks, Wait for I/O is seen positively as an opportunity - we can do throw in more work to boost throughput.

Any workload in which the CPU does comparatively little work compared to the volume of disk I/O is going to give you high Wait for I/O.

If this high Wait for I/O is a sudden change from the normal pattern then it needs investigating and you should make sure as many disks as possible are involved in the disk I/O.

But lots of workloads just run like this - a common example I come across regularly is SAP databases. SAP cleverly caches lots of data but on large database it has to do lots of disk I/O for particular customer or whatever records. Once the data is available it is sent to the SAP application servers i.e. little work is done on the database.

In fact, faster CPUs would mean even high wait values. Question 35: On AIX, free memory is near zero, how do I free more memory?

This is just how AIX works and is perfectly normal. All of memory will be soaked up with copies of filesystem blocks after a reasonable length of time and the free memory will be near zero. AIX will then use the lrud process to keep the free list at a reasonable level. If you see the lrud process taking more than 30% of a CPU then you need to investigate and make memory parameter changes. Question 36: How can I set numperm better?

You can't. This number just reflects the amount of memory being used for disk blocks - called the buffer cache. It is controlled by three parameters minperm, maxperm and strictperm but these set thresholds and algorithms. The actual numperm number reflects what is actually going on. You will have to find other places for tuning these parameters as it is beyond the scope of this FAQ.

It is also worth noting that the nmon values for numperm and maxperm are based on a percentage of physical memory. The AIX commands report a percentage but not of all memory - they seem to remove some memory that might be something like the memory allocated to the AIX kernel (i.e. it could never be used as cache). Unfortunately this is not documented and the memory size not counted is not available with any public API. So nmon does the best it can but the numbers will not be absolutely the same. Question 37: What format is the nmon output file?

Plain ASCII text that you can edit and editable with vi (but you might hit the 2048 byte line limit on the AIX vi). I use the Open Source vim on AIX to avoid this or do it on Linux.

    The first token on the line tells you what sort of data it is
        AAA lines are basic nmon data about this collection of data
        BBB lines are about the configuration of the machine
        ZZZ lines include the date and time stamp stored here once ro reduce output
        others should be obvious
    the second field is the Timestamp - see the ZZZ section to the actual time
    then there is the data
    each sort of data (CPU, DISK, etc.) has a Header line that describes the columns and the header lines also include the graph titles

You do not need to sort the nmon output file for nmon2rrd or the Analyser but it you do then you can see the sections easier for editing. Question 38: I have collected once a second for 8 hours but I can't get the Analyser to work?

You have 28800 data points and you want to see this on a screen with say 1024 pixels wide !!

    that is 29 data points per pixel.

My new Thinkpad has 1400 pixels across the screen, so I am down to just 18 data points per pixel

    what where you thinking !!

I think even with the best will in the world, the analyser spreadsheet is going to struggle. On a tiny machine you get about 1.5KB per snapshot and a normal size machine with a few nmon options it is more like 60KB each. At 60KB the maths --> 28800*60KB = 1.6GB. How big is your output file? I hope you have at least 4 GBs of memory in your PC to handle this!

As I hope you know the nmon file is text and editable with vi (but you might hit the 2048 byte line limit on the AIX vi). I use the Open Source vim on AIX to avoid this or do it on Linux. If you take a look at the file format you should be able to cut done the file size and make a series of files but each will need the header section that you will find at the top of the file and then a different set of snapshots. Question 39: nmon does not work on my Linux machine!!

nmon runs on x86 (Intel and AMD), mainframe and POWER processors and on a dozen or so versions of Linux. If you report problems I will need to know which platform and which Linux version plus distro before I can help so please include these with initial questions. Question 40: When do we get nmon 10 for Linux?

The Linux & AIX source code for nmon is very different apart from curses framework and basic approach. AIX gets all the information from system and library calls and in Linux this has to be read from the /proc filesystem. This means the AIX code is more straight forward. So there is no need for Linux and AIX to have the same version number. From nmon version 11, the AIX and Linux user interfaces where made the same and release with the same version number to keep people happy. There was no nmon for Linux version 10. Question 41: The boxes and lines in nmon do not work right online with: DTterm, xterm, rvxt, putty, VNC, (whatever you have)?

nmon uses curses to handle the displaying of characters on the terminal. This is controlled mostly by your TERM variable setting. The nmon developer tests with all of the above. They work perfectly and they work perfectly all the time. If it does not work for you then you have some setting wrong on your machine or X Windows or have some strange settings for TERM and/or TERMINFO shell variable setting or you are using a duff terminal emulator.

Let me state that again: your system has a problem not nmon.

The TERM shell variable should be set to the terminal emulator you are using.

    If you are using a xterm then TERM should be xterm
    If you are using DTterm then TERM should be dtterm
    If you are using an AIX term then TERM should be set to aixterm
    Get the idea - other combinations are your problem.

Unless you are using a genuine 1970's DEC VT100 then you should not be using this setting with more advanced terminal emulators. I remember VT100's well, even found a bug in the firmware once!

The TERMINFO variable should not be set to anything (in fact not set at all). If it is then you or someone has been mucking about with terminfo databases and why are you blaming nmon?

Terminal Emulators:

    xterm works well in black and white.
    aixterm works well and has colour and nmon uses the colour.
    DTterm works well and has colour and nmon uses the colour.
    rxvt and xterm-color combination (see WWW for details on setup, on google.com search for xterm-color and AIX) - this combination also lets vim (the improved vi from Open Source) use syntax highlighting in C code.
    The Windows telnet terminals emulation is very poor indeed and not recommended under any circumstances - you are on your own.
    The best alternative on a Windows PC is putty (see WWW for details and download) and is highly recommended - I use this every day - this will work with TERM set to xterm perfectly.
    VNC is, of course, even better and gives you X windows on a Windows PB at zero cost - again highly recommended.

The -B option starts nmon with no boxes (or colour). Some purists do not like to waste the screen space with the box lines. You could add 'B' to the NMON shell variable to make this automatic: export NMON=B Question 42: I have 2400 disk (small SAN LUNs) and nmon is slow to collect the stats from so many, can you help?

I guess you are learning the folly of small LUNs and that it makes the totally machine unmanageable. But you are not the first or worst - the record stands at 4500. Some suggestions:

    Have you got more than four paths to each LUN?
        If yes, you need to fix this ASAP as it is bad for performance and terrible for RAS (and I mean really bad).
    Use the -D flag to stop nmon collecting disk configuration each time can really help the start up time.
    Collect this disk configuration just the once - unless you are changing the disks a lot!!
    You can use nmon User Defined Disk Groups to limit the output but nmon will still have to collect all the data from all the disks and then reduce what is actually reported.
    But the only real solution is to reduce the number of disks you have - yes, I know this is a lot of work but you have a machine setup that can not be managed and that is not viable in the long term.
    Don't blame nmon for highlighting the issue.

I recommend 32 to 64 LUNs and make the disk subsystem do the hard work of spreading the data across disk - i.e. not you. After all that is what you buy big disk subsystems for and there a better uses of your time and thought. Question 43: Adapter stats and IOADAPT is not saved to the nmon file seems to be missing with AIX 5.1?

Correct, this data is not available on AIX 5.1 from the libperfstat library. This also causes a problem on nmon2rrd version 10 where it expects the IOADAPT section and crashes. Recommended action upgrade AIX as 5.1 is not supported without purchasing extended support. Question 44: What is CharIO (a column of the TOP processes stats)?

This is the character I/O that a process is generating and it is counted from calls to the read() and write() systems calls. I/O started in other ways like Async I/O (commonly used by an RDBMS), paging or memory mapped files are not included. The number fetch from the AIX kernel using the getprocs64() system call and the structure found in /usr/include/procinfo.h - look for the pi_ioch variable. Question 45: On Linux the disk stats are all doubled?

nmon collects the data from /proc and displays it. On newer Kernels this is ht e/proc/diskstats file. It was decided a long time ago that hiding data was a very bad idea as it can go wrong and then be very misleading - this is how the ozone hole was missed for 5 years and not detected - the algorithm decided the data must be wrong and deleted it from the stats. The Linux disk stats (in three different files and four formats depending on the Linux version - great coding guys!!) reports both disk level and disk partition level stats in the same file. nmon just shows you the stats - it is your job to understanding them. nmon does not and with LUNs on SAN disks and software RAID and LVM's it is much safer to show everything. Question 46: On AIX the disk seem to be mostly on the first adapter?

nmon now collects the adapter data from AIX libperfstat. This is the addition of the disk stats added up by knowning which disk is conected to which adapter. This of course, is complex for mutlipath IO disks. AIX seems to build this map from the order in which disks are discovered rather than used. Depending on your initial setup it can often mean that most disks are assigned the first one or two adapters. Sorry, there is nothing that nmon can do about this. To list what nmon is extracting from the libperfstat library you can use the sample code and precompiled for AIX 5.3 binaries from the Roll Your Own Wiki page at ryo Question 47: On nmon for Linux the CPU Wait for IO number is zero or odd?

This number is not available in the /proc filesystem until the 2.6 kernel and then it appears in the undocumented fields at the end of a line - I have fixed this for the 2.6 kernels in nmon for Linux version 11c. Question 48: On nmon for Linux the paging details are missing and the PAGE lines for the capture to file are missing.

This data was very hard to locate and now appear in nmon for Linux version 11d onward for the 2.6 kernel. Before this kernel version the data is not present in /proc. Question 49: I want to collect data every second and then see weekly and monthly reports. How?

Let us take this in simple bite-size chunks:

    First, a piratical point, most Laptop and PC screens are 1024x x768 pixels. The point is that no matter how many data points you have you can not even see a maximum of about 800 data points. This is why I recommend about 300 to 400 data captures with nmon to get good looking graphs.
    Second, one second stats for a day give you (60 x 60 x 24) 86400 data points! So OK let us try one minute stats then we have 1440 data points, which is still to many. So we need to move to 5 minutes captures and we get to a sensible 288 data points and a good looking graph.
    Third, we then collect data for a month 288 x 31 = 8928 data points - oh dear far to may data points again!! so now we have to drop down to once an hour data capture (24 x 31) and we hav 741 data points which is only just possible - we had better start thinking about the purchase of a bigger screen.
    If you then want to compare months or have a yearly report ... well you get the idea by now, we are now monitoring 12 hour periods.

But the above is only a physical problem. The much larger logical problem is still there to catch you out and that problem is averaging out. A long time ago I noticed that the shorter the time period that you use to monitor the more fluxuations you notice in the data.

Philosophy: If you keep using shorter and shorter periods you will eventually see that the CPUs are either 100% busy or 100% idle all the other numbers are just a feature of humans not thinking fast enough and having to average out the CPU use in longer periods.

Anyway, for performance tuning we need to concentrate on the peaks. Take a look at the below graph:

If we average the whole day we get 50% which completely hid the peaks of the data time and the heavy CPU load during the evening batch. If this computer was not used during Saturday and Sunday the average might come down to 35%. The point is averaging data over longer periods removed all the important peaks.

This is in addition to the data management problem.

Due to these three problems:

    Data overload - to many data points
    Averaging out - eliminates the vital data
    Manipulation - the data will need to be stored, manipulated and displayed - non-trivial

I think many people make the mistake that this long term reports from nmon is an easy task but it will turn out to be very hard work and often the results are utterly pointless or meaningless.

If you must attempt this then I recommend:

    rrdtool to summarise data for you and draw graphs
    ploticus looks like a good tool
    take a look at Ganglia

Question 50: nmon will not start on AIX 5.1 due to a libperfstat error?

The error is something like: exec(): 0509-036 Cannot load program <nmon binary file here> because of the following errors: 0509-150 Dependent module libperfstat.a(shr.o) could not be loaded. 0509-022 Cannot load module libperfstat.a(shr.o). 0509-026 System error: A file or directory in the path name does not exist.

You will need to have installed the libperfstat library from the AIX CDROMs. This is in bos.perf.libperfstat package.

I hope you realise that AIX 5.1 is not normally supported without extra payments as it is so old. Question 51: How do I work out the Physical CPU use on Linux on POWER for shared processor LPARs?

Here is a Korn shell script that shows you where to get the data and the maths involved.

  1. !/usr/bin/ksh

before=`grep purr /proc/ppc64/lparcfg | sed 's/purr=//'` echo before=$before

integer seconds=2 sleep $seconds

after=`grep purr /proc/ppc64/lparcfg | sed 's/purr=//'` echo afterr=$after

timebase=`grep timebase /proc/cpuinfo | awk '{print $3 }' ` echo timebase=$timebase

string="($after-$before)/$timebase/$seconds" echo string $string bc <<EOF scale=5 $string EOF

Question 52: The Disk Busy stats are missing on AIX

If you are watching this on line it will be flashing

To enable disk stats as root: chdev -l sys0 -a iostat=true

at you - this is a big hint on how to switch them on !!! Question 53: Sort order problems with massive nmon output files.

So you collected more than 9999 snapshots in a single nmon capture. Ignoring the fact that the Excel Analyser can't cope with all this data and it makes the data unmanageable. We suggest a good aim is between 400 and 700 snapshots per file for good graphs and manageable file sizes. Anyway, you then find out that if you sort the file the rows don't even sort in the right order. The problem is you have four digit and five digit Timeshot numbers - the T numbers. This mucks up the sort ordering. What can you do? Try this on the AIX system - should work on Linux too, it makes all the T numbers 5 digit and then they can be sorted:

sed 's/\(,T\)\([0-9][0-9][0-9][0-9]\)\(,\)/\10\2\3/' original.nmon >original5digit.csv sort -n original5digit.csv >fixed.csv

Full marks if you understand the sed command - this is very advanced regular express stuff

Question 54: AIX 5.3 updated but then nmon gives "Illegal instruction(coredump)"

This has been reported shortly after an upgrade to a AIX 5.3 higher ML (like ML5 or ML6) and reboot. After a lot of research and experiments the following was found by a persistent nmon user called Xi Chen. The problem seems to be nmon jumping to a library like libperfstat and the jump vectors are not right so the library/system call jumps to address zero and attempts to execute instruction zero (invalid, of course). This is a bug in AIX and its update process where the libperfstat kernel package does not match the library. Try the following command: # lslpp -L | grep -i perfstat

You may get something like:

  1. lslpp -L | grep -i perfstat bos.perf.libperfstat 5.3.0.50 C F Performance Statistics Library bos.perf.perfstat 5.3.0.60 C F Performance Statistics

Update the package bos.perf.libperfstat to the same (5.3.0.60) or at least much closer levels (like 5.3.0.60 and 5.3.0.61) as bos.perf.perfstat. Preferably, the latest available levels.

Question 54: AIX 5.3 updated but then nmon gives "Assert Failure"

This has been reported shortly after an upgrade - some machines have this problems while others don't. There does not seem to be a pattern. There has been a lot of investigation of this issue with tools being written but it is still a mystery. The libperfstat library is claiming that an invalid parameter has been passed but tools have shown this is not true. The three parameters are a pointer to memory (just malloc'ed in the code), the number of adapters (just returned by the previous call to libperfstat) and the size of the diskadapter structure (which has never changed). The output looks like this:

ERROR: Assert Failure in file="nmon11.c" in function="main" at line=3300 ERROR: Reason=System call returned -1 ERROR: Expression=perfstat_diskadapter((perfstat_id_t * )FIRST_DISKADAPTER, p? ERROR: errno=22 ERROR: errno means : Invalid argument

Then it has been found that a reboot fixes most of these Assert Failures. We don't fully understand this but it may be adapters in funny states, or kernel modules need to be reloaded or libperfstat in a twist - one thing we do know - its not nmon! If you hit this problem:

    Check the software levels, see Question 53
    Do you think that you rebooted after the upgrade or do you know for absolutely sure!!
    Try: export NMON_IGNORE_ASSERT=1 and then start nmon from this same ksh. This may work around the problem as nmon bravely tries to carry on even with library errors.
    Try the latest beta version of nmon (if it supports your AIX level).
    I know rebooting can be a problem with production systems but it fixes this the vast majority of the time.
    If still its a problem, let us know via the usual AIX Performance Tools Forum.

Question 55: On AIX 5.3 ML6, nmon output files contain zeros, missing CPU stats, corrupt ZZZ lines and "nfs" strings found in the stats

This is yet another bug in the AIX libperfstat library at this ML6. The NFS data returned to nmon is corrupt and these characters may be output directly from the library (very bad form chaps!).

The work around is:

    Do not include NFS statistics (remove the -N)
    Move to nmon12 that codes around these bugs.

Question 56: Does nmon capture point in time stats or averages?

Well there are two type of numbers

    rates and
    absolutes.

For an absolute example, free memory is an absolute - nmon just show you how much is memory is free. For a rate example, the network stats are rates, here nmon does the following:

    Capture a complete set of counters - these are incremented by the kernel like the number of bytes sent.
    then nmon waits the number of seconds you asked
    then nmon captures a second set of these counters
    then nmon calculates the difference between the two sets and divides by the number of seconds, so everything is per second
    this number is then displayed on screen or written to the data file

So the rates are the average between the two capture points. As the number of seconds increases the rates get more and more steady but note if you reduce the seconds to just one (the minimum to make sure nmon does not use too much CPU time) you will see lots more peaks and dips in the numbers.

"Point in time" numbers would be very misleading as they would miss all the peaks and dips in between - you would have to take dozens of them to be sure you are really seeing a representative number. Question 57: Why is the Process memory percentage zero? (same for System and User percent)

This seems to happen in AIX 5.3 TL07 or there about. In fact, it is the AIX libperfstat library, which nmon uses, that has a bug in it that returns a large negative number for the Process% value. The Process, System and User Percentages are approximations (remember memory has many modes, types and uses and some overlap) and the calculation goes wrong.

nmon reports this problem by showing 0% - which is clearly impossible.

The bug was very hard to reproduce and track down because the problem only happens in particular circumstances and changes in memory use (like starting and stopping large memory applications). I am pretty sure you have a good chance of the number being fixed (for at least some time but may reappear), if you reboot the machine/LPAR.

The fix is to update AIX to AIX 5.3 TL09 (or even better AIX 6) but there may be a PTF or efix. You will have to ask AIX Support by asking for a fix to the libperfstat library to fix the real_system, real_process and real_user members of the perfstat_memory_total_t structure. That will give them the right details to search for in the Retain database. Do not ask for nmon classic support as the answer could be short and/or rude!

In my experience AIX systems administrators don't like adding these updates to a production machine. So it may be better to just accept that if any of these numbers are zero then do not use any of these percentages. Question 100: When will nmon collect data from lots of machines or LPARs?

Answer: Never. I like to think nmon does one job and does it well - it collects data from one machine and saves it in one file. Going multiple machine or LPAR has many problems:

    Collecting data from lots of machines or LPARs would require network access and lots of error handling for missing or late data.

    The nmon output file would then be far more complex and have to include the machine names and totally rewrite the time stamps.
    We already suffer from too much data than Excel can handle.
    There would simply be too much data to display
    This complication would mean nmon becomes very large and code stability would take a long time to settle down

What you do need is:

    Less data and then you drill down of particular nodes
    Automated database generation to store the data
    Automated graphing of the data you really want
    History for the last hour, day, week, month year
    Small simple daemons on the nodes and automated central collection point
    Simple method of collecting more stats
    Open Source code to make it safe and simple to implement.

This tool is called Ganglia, see http://ganglia.sourceforge.net/ See Question 101 Question 101: When will nmon collect data like "topas -C"?

It may not be obvious but topas and topas -C are two completely different programs hidden in one binary. The cross partition stats involved communicating with each LPAR and the HMC to get the data unlike the local stats that just calls the local kernel API. The cross partition version of nmon has already been written it is called Ganglia please see http://www.ibm.com/collaboration/wiki/display/WikiPtype/ganglia for more details. OK, it is an excellent Open Source tool and nothing to do with nmon but it is has all the right stats, many brilliant features, is very simple to implement and has very little impact on performance. There is no need to duplicate this work and it also supports lots of operating systems, the output is via a website and the data is in graph form and it keeps historic data - so this is better then text output on a dumb screen and only for root users.

Edit - History - Print - Recent Changes - Search
Page last modified on January 03, 2017, at 04:11 PM