This week, we have been experiencing performance issues with our Zenoss Core 4.2.3 (Production).
We have more than 400 Nodes for monitoring, very sensitive information.
Below, are the server details :
Version : Zenoss Core 4.2.3 (VMware)
OS : Linux (x86_64) 2.6.32
Database : MySQL 5.5.28 (5.5.28)
System Load :
Cpu(s): 91.2%us, 4.8%sy, 0.4%ni, 2.5%id, 0.9%wa, 0.1%hi, 0.2%si, 0.0%st
Mem: 10129996k total, 9991592k used, 138404k free, 60380k buffers
Swap: 8208376k total, 2953348k used, 5255028k free, 6070304k cached
09:51:38 up 1 day, 22:18, 3 users, load average: 112.41, 126.38, 133.55
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 2
After some investigation, I found that this process is using more than 100 % cpu and memory.I don't know the reason why :
[zeno@zenoss5 ~]$ ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head
PID PPID CMD %MEM %CPU
3118 1 java -server -XX:+HeapDumpO 54.9 707
3312 1 /opt/zenoss/bin/python /opt 0.8 17.4
2127 2013 /usr/sbin/mysqld --basedir= 3.1 12.3
[zeno@zenoss5 ~]$ ps -ef | grep java
zenoss 3118 1 99 Feb15 13-16:03:45 java -server -XX:+HeapDumpOnOutOfMemoryError -DZENOSS_COMMAND=zeneventserver -DZENHOME=/opt/zenoss -Djetty.home=/opt/zenoss -Djetty.logs=/opt/zenoss/log -Dlogback.configurationFile=/opt/zenoss/etc/zeneventserver/logback.xml -DZENOSS_DAEMON=y -jar /opt/zenoss/lib/jetty-start-7.5.3.v20111011.jar --config=/opt/zenoss/etc/zeneventserver/jetty/start.config --ini=/opt/zenoss/etc/zeneventserver/jetty/jetty.ini --pre=etc/zeneventserver/jetty/jetty-logging.xml
For any other related information needed, please let me know.
Our Zenoss has been unresponsive for more than 3 days, so any kind of help would be really appreciated.
I was having a similar issue a couple of weeks ago, but in Zenoss 5. My issue came down to a bad transform that caused me to have a backlog of 400,000 events in Zep.RawEvents, which related to zeneventd.
Check out this page herehttp://wiki.zenoss.org/ZenOSS_Logical_Model and check your event queues.
Here's info on Queue Troubleshooting to determine what might be causing your queues to backlog: http://wiki.zenoss.org/Queue_Troubleshooting
I was able to look at my queues, determined Zep.RawEvents was backed up, determined that zeneventd was the culprit, checked the log files in /opt/zenoss/log/ and found the transform(s) causing issues.
I also commited more memory to zeneventd and zeneventserver in the end.