Ref: Performance Tuning for Linux® Servers
Ref: System Performance Tuning, Second Edition
Ref: Optimizing Linux® Performance: A Hands-On Guide to Linux® Performance Tools
Linux Performance Tools:
Processor time is organized into four timed modes: system time, user time, I/O wait time, and idle time. The idle time consists of what’s left over when all other portions have had their fill. A program’s normal operating state is user mode, but as it runs, it may generate requests for services that the kernel provides, such as I/O. These requests require the attention of the operating system, so the program switches into system mode, then returns to user mode when the request is complete. The time spent in these two modes is tabulated independently to give the user time and system time values, respectively. These two figures account for the majority of a process’s execution time.
Note that vmstat reports only the user time, system time, and idle time (wait time is summed in with idle time). In order to get separated values for wait time and idle time, use mpstat.
When a process waits for a block device data request to complete, it incurs I/O wait time. This brings up an important fact: when a process is blocked in this fashion, all idle time becomes wait time. If your idle time is zero, as reported by vmstat, the first thing you should check is if your system has I/O throughput problems.
vmstat (Virtual Memory Statistics):
- How many processes are running
- How the CPU is being used
- How many interrupts the CPU receives
- How many context switches the scheduler performs
vmstat [-n] [-s] [delay [count]] vmstat 2 5
Sample output:
[ezolt@scrffy tmp]$ vmstat procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 181024 26284 35292 503048 0 0 3 2 6 1 5 1 94 0
What to look for:
One key to focus on is the (wa), as consistent high numbers here is definitely a problem waiting on I/O. Usually seeing consistent (wa) in 10+ will show degradation of the system. Once it reaches 35+ you will not need to look at the statistics as your users will be complaining.
Look for high numbers in either system or user space. If you see consistent high numbers for user space it could be an application that has a process that is consuming too much resources. In this case look at top to see if you can identify the problem process. If you see consistent high numbers for system then you also need to look into what programs are taking the CPU resources and evaluate their status
To be continue…