NodeManager OOM挂掉问题解决

865 查看

博客原文
hackershell

在更换JDK1.625到JDK1.745后,集群出现频繁死掉NM,出现结果为如下:

2015-08-12 16:35:06,662 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,10,system] threw an Error. Shutting down now...
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.UNIXProcess$ProcessPipeInputStream.drainInputStream(UNIXProcess.java:267)
at java.lang.UNIXProcess$ProcessPipeInputStream.processExited(UNIXProcess.java:280)
at java.lang.UNIXProcess.processExited(UNIXProcess.java:187)
at java.lang.UNIXProcess$3.run(UNIXProcess.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

和类似的

2015-08-12 16:37:56,893 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,10,system] threw an Error. Shutting down now...
java.lang.OutOfMemoryError: Java heap space
at java.lang.UNIXProcess$ProcessPipeInputStream.drainInputStream(UNIXProcess.java:267)
at java.lang.UNIXProcess$ProcessPipeInputStream.processExited(UNIXProcess.java:280)
at java.lang.UNIXProcess.processExited(UNIXProcess.java:187)
at java.lang.UNIXProcess$3.run(UNIXProcess.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

在google搜索关键字hadoop UNIXProcess drainInputStream,找到关于JDK7的一些bug,在NM负载高的情况下,出现OOM问题。 详情请看HADOOP-10146

和一些相关解释:

JDK-8027348

JDK-8024521

后来更换JDK1.7_67则没出现OOM的问题