Subscribe to Posts by Email

Subscriber Count

    701

Disclaimer

All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove. Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only. This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only. If you see any issues with Content and copy write issues, I am happy to remove if you notify me. Contact Geek DBA Team, via geeksinsights@gmail.com

Pages

Unix: Finding high I/O waiting process in linux

We often need to find which process ID is causing high I/O or having I/O waits.

Unfortunately there is no real time specific tool that is available apart from iotop which is by default not available in all environments (as in my case).

How we can determine which process is generating more I/O (or in other words I would call it as waiting uninterruptible.

The first and foremost part is top command, where in many of us does not know that “wa” is I/O related. (as we are DBA’s)

# top
top - 14:31:20 up 35 min, 4 users, load average: 2.25, 1.74, 1.68
Tasks: 71 total, 1 running, 70 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.3%us, 1.7%sy, 0.0%ni, 0.0%id, 96.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 245440k total, 241004k used, 4436k free, 496k buffers
Swap: 409596k total, 5436k used, 404160k free, 182812k cached

wa -- iowait Amount of time the CPU has been waiting for I/O to complete.

The second important part is to check is iostat with –x option, extended,

$ iostat -x 2 5
avg-cpu: %user %nice %system %iowait %steal %idle
                3.66        0.00   47.64    48.69     0.00       0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 44.50 39.27 117.28 29.32 11220.94 13126.70 332.17 65.77 462.79 9.80 2274.71 7.60 111.41
dm-0 0.00 0.00 83.25 9.95 10515.18 4295.29 317.84 57.01 648.54 16.73 5935.79 11.48
107.02
dm-1 0.00 0.00 57.07 40.84 228.27 163.35 8.00 93.84 979.61 13.94 2329.08 10.93 107.02

The bold parts in the above command tells me that %util of the sda and dm-0 and dm-1 are very huge, means that my server is I/O bounded.

The third part will be finding what is the culprit process or the problematic process that is causing (uninterruptible) sleep to complete the I/O.

Before going to do that, we should know process list state options

The ps command has statistics for memory and cpu but it does not have a statistic for disk I/O. While it may not have a statistic for I/O it does show the processes state which can be used to indicate whether or not a process is waiting for I/O.
The ps state field provides the processes current state; below is a list of states from the man page.

PROCESS STATE CODES
D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but not reaped by its parent.

So we can use D to know which process sleeping on I/O to complete. Processes that are waiting for I/O are commonly in an “uninterruptible sleep” state or “D”; given this information we can simply find the processes that are constantly in a wait state.

To do that a simple loop with ps command with certain arguments and grep the D state gives us the processes that are waiting for I/O.

Command to find the process that are sleeping on I/O to complete

# for x in `seq 1 1 10`; do ps -eo state,pid,cmd | grep "^D"; echo "----"; sleep 5; done

The above for loop will print the processes in a “D” state every 5 seconds for 10 intervals.

Example Output:-

D 248 [jbd2/dm-0-8]
D 16528 C++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp
----
D 22 [kswapd0]
D 16528 C++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp
----
D 22 [kswapd0]
D 16528 C++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp
----
D 22 [kswapd0]
D 16528 C++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp
----
D 16528 C++ -n 0 -u 0 -r 239 -s 478 -f -b -d /tmp

As you see constantly 16528 process is having high I/O wait sleep on /tmp, to determine further we can drill down to I/O statistics for this process using, this is fourth part

# cat /proc/16528/io
rchar: 48752567
wchar: 549961789
syscr: 5967
syscw: 67138
read_bytes: 49020928
write_bytes: 549961728

cancelled_write_bytes: 0

I have approx 46 MB reads and 524 MB writes to the disk /tmp

Now I have identified the top process that causing high I/O and now determine which filesystem the I/O is spreading to, well we are familiar with this, just use lsof

# lsof -p 16528
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
C++ 16528 root cwd DIR 252,0 4096 130597 /tmp
<truncated>
C++ 16528 root 8u REG 252,0 501219328 131869 /tmp/C++.16528
C++ 16528 root 9u REG 252,0 501219328 131869 /tmp/C++.16528
C++ 16528 root 10u REG 252,0 501219328 131869 /tmp/C++.16528
C++ 16528 root 11u REG 252,0 501219328 131869 /tmp/C++.16528
C++ 16528 root 12u REG 252,0 501219328 131869 /tmp/C++.16528

Finally just use df /tmp will show you the disk/devices that it is mounted.

So we have use top down approach as follows:-

1) identified with top commands whether any I/O waits appearing - “wa” section

2) Using iostat –x 2 5 , we have checked and confirmed by seeing the util% column that system is I/O bounded

3) Using simple loop and proceses state list “D” , we have checked which process is sleeping on I/O

4) Further we confirmed from the process statistics /proc/pid/io that how much read/writes doing by this process

5) Finally we used lsof which files that process is touching upon and using df for those files we have identified which filesystem is having more I/O

Thanks

Geek DBA

1 comment to Unix: Finding high I/O waiting process in linux

  • venkat

    Thank You for the post. Its surely useful for me to improve my knowledge at OS level.

    Thank You very much.