Monday, September 22, 2014

Linux Kernel panic issue: How to fix hung_task_timeout_secs and blocked for more than 120 seconds problem

Read full details here: Linux Kernel panic issue: How to fix hung_task_timeout_secs and blocked for more than 120 seconds problem

Linux Kernel panic issue

Short description: This guide shows how to fix hung_task_timeout_secs and blocked for more than 120 seconds problem in Linux.

Background

My server became unresponsive today (around 15:38hrs)
I’ve collected following logs that shows Memory and CPU usage and narrowed down /var/log/messages.
After doing a hard reboot, it came back online but I was unable to access it via VNC or SSH.
VNC connection showed an error (many errors but all contained /proc/sys/kernel/hung_task_timeout_secs“)
INFO: task jbd2/vda3-8:250 blocked for more than 120 seconds.
 Not tainted 2.6.32-431.11.2.el6.x86_64 #1
 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Step by step troubleshooting data and logs


Check Memory usage

Following log shows server memory usage
someuser@servercore [/var/log]# sar -r

15:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
15:20:01 476604 1396772 74.56 110140 707116 1201652 30.64
15:30:02 526240 1347136 71.91 110412 710536 1165148 29.71

15:55:53 LINUX RESTART

16:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
16:10:01 517168 1356208 72.39 136040 588964 1196724 30.52
16:20:01 510580 1362796 72.75 137488 596560 1191664 30.39
As you can see, it’s not that high and I had plenty of free Memory.

Check CPU usage

Following log shows CPU usage.
someuser@servercore [/var/log]# sar -u
15:00:01 CPU %user %nice %system %iowait %steal %idle
15:20:01 all 6.01 0.04 1.74 1.59 0.14 90.48
15:30:02 all 4.97 0.04 1.54 7.87 0.15 85.44
Average: all 7.20 0.06 2.19 2.69 0.26 87.60

15:55:53 LINUX RESTART

16:00:01 CPU %user %nice %system %iowait %steal %idle
16:10:01 all 9.13 0.04 2.78 6.98 0.31 80.76
16:20:01 all 4.21 0.04 1.39 3.49 0.15 90.73
Again, CPU wasn’t at 100%. This is now getting annoying that I can’t explain why I am getting into s**tstorm for nothing.
Let’s check //var/log/messages to find all the error logs related this this kernel panic

Check Kernel Panic Logs

Now I am getting somewhere …
someuser@servercore [/var/log]# grep 'Aug 22 15' messages | grep -v Firewall | grep -v blackmore | grep -v operational | grep -v ec2
Aug 22 15:38:05 servercore kernel: INFO: task jbd2/vda3-8:250 blocked for more than 120 seconds.
Aug 22 15:38:05 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:05 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:05 servercore kernel: jbd2/vda3-8 D 0000000000000000 0 250 2 0x00000000
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: INFO: task rs:main Q:Reg:1035 blocked for more than 120 seconds.
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: rs:main Q:Reg D 0000000000000000 0 1035 1 0x00000080
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: INFO: task queueprocd - qu:1793 blocked for more than 120 seconds.
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: queueprocd - D 0000000000000000 0 1793 1 0x00000080
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: INFO: task httpd:30439 blocked for more than 120 seconds.
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:07 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:07 servercore kernel: httpd D 0000000000000000 0 30439 2223 0x00000080
Aug 22 15:38:07 servercore kernel: Call Trace:
Aug 22 15:38:11 servercore kernel: INFO: task httpd:30482 blocked for more than 120 seconds.
Aug 22 15:38:11 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:11 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:11 servercore kernel: httpd D 0000000000000000 0 30482 2223 0x00000080
Aug 22 15:38:11 servercore kernel: Call Trace:
Aug 22 15:39:54 servercore kernel: INFO: task jbd2/vda3-8:250 blocked for more than 120 seconds.
Aug 22 15:39:54 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:39:54 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:39:54 servercore kernel: jbd2/vda3-8 D 0000000000000000 0 250 2 0x00000000
Aug 22 15:39:54 servercore kernel: Call Trace:
Aug 22 15:39:54 servercore kernel: INFO: task flush-253:0:263 blocked for more than 120 seconds.
Aug 22 15:39:54 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:39:54 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:39:54 servercore kernel: flush-253:0 D 0000000000000000 0 263 2 0x00000000
Aug 22 15:39:54 servercore kernel: Call Trace:
Aug 22 15:39:56 servercore kernel: INFO: task rs:main Q:Reg:1035 blocked for more than 120 seconds.
Aug 22 15:39:56 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:39:56 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:39:56 servercore kernel: rs:main Q:Reg D 0000000000000000 0 1035 1 0x00000080
Aug 22 15:39:56 servercore kernel: Call Trace:
Aug 22 15:42:11 servercore kernel: Clocksource tsc unstable (delta = -8589964877 ns)

15:55:53 LINUX RESTART
As you can see all the errors contained “echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.” and “blocked for more than 120 seconds” somewhere.
Now let’s fix this problem once and for all..




Read the rest of it here: Linux Kernel panic issue: How to fix hung_task_timeout_secs and blocked for more than 120 seconds problem

No comments:

Post a Comment