Configuration & Administration

 View Only
  • 1.  Health Checks for opentsdb & Central Query failing

    Posted 08-21-2017 11:18 AM

    Hi Zenoss Community,

    I was hoping someone could assist here.

    We have an instance of Zenoss Core 5.2.6 that has been running without issues. All services checked out on day one of deployment couple of months ago.

    Today I decided to do some routine checks and maintenance on the health of the actual server and found that opentsdb and Central Query is failing the health checks and a red circle with an exclamation mark is present. When hovering over it, it lists Answering as the reason. Zenoss works and everything is accessible and monitoring as it should.

    Any ideas what could've gone wrong here? Is it even important?

    Here are the top 10 lines on the Central Query log file:

    August 21st 2017, 18:05:55.230 0
    INFO [2017-08-21 16:05:48,655] org.apache.http.impl.client.DefaultHttpClient: I/O exception (java.net.SocketException) caught when processing request: Connection reset
    August 21st 2017, 18:05:55.230 0
    INFO [2017-08-21 16:05:48,655] org.apache.http.impl.client.DefaultHttpClient: Retrying request
    August 21st 2017, 18:04:50.227 0
    INFO [2017-08-21 16:04:43,401] org.apache.http.impl.client.DefaultHttpClient: Retrying request
    August 21st 2017, 18:04:50.227 0
    INFO [2017-08-21 16:04:43,401] org.apache.http.impl.client.DefaultHttpClient: I/O exception (java.net.SocketException) caught when processing request: Connection reset
    August 21st 2017, 18:03:45.225 0
    INFO [2017-08-21 16:03:38,165] org.apache.http.impl.client.DefaultHttpClient: I/O exception (java.net.SocketException) caught when processing request: Connection reset
    August 21st 2017, 18:03:45.225 0
    INFO [2017-08-21 16:03:38,165] org.apache.http.impl.client.DefaultHttpClient: Retrying request
    August 21st 2017, 18:01:30.880 0
    INFO [2017-08-21 16:01:29,827] org.apache.http.impl.client.DefaultHttpClient: I/O exception (java.net.SocketException) caught when processing request: Connection reset
    August 21st 2017, 18:01:30.880 0
    INFO [2017-08-21 16:01:29,827] org.apache.http.impl.client.DefaultHttpClient: Retrying request
    August 21st 2017, 18:00:35.878 0
    INFO [2017-08-21 16:00:29,259] org.apache.http.impl.client.DefaultHttpClient: Retrying request
    August 21st 2017, 18:00:35.878 0
    INFO [2017-08-21 16:00:29,259] org.apache.http.impl.client.DefaultHttpClient: I/O exception (java.net.SocketException) caught when processing request: Connection reset
    August 21st 2017, 17:59:30.876 0
    INFO [2017-08-21 15:59:29,037] org.apache.http.impl.client.DefaultHttpClient: Retrying request


    opentsdb Log file:


    Starting opentsdb with ZK_QUORUM=localhost:2181
    2017-08-21 16:12:58,960 CRIT Supervisor running as root (no user in config file)
    2017-08-21 16:12:59,060 INFO RPC interface 'supervisor' initialized
    2017-08-21 16:12:59,061 CRIT Server 'inet_http_server' running without any HTTP authentication checking
    2017-08-21 16:12:59,062 INFO supervisord started with pid 38
    2017-08-21 16:13:00,110 INFO spawned: 'tsdbwatchdog' with pid 41
    2017-08-21 16:13:00,114 INFO spawned: 'opentsdb' with pid 42
    2017-08-21 16:13:01,354 INFO success: tsdbwatchdog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
    2017-08-21 16:13:03,339 INFO exited: opentsdb (exit status 1; not expected)
    2017-08-21 16:13:04,343 INFO spawned: 'opentsdb' with pid 123
    2017-08-21 16:13:06,104 INFO exited: opentsdb (exit status 1; not expected)
    2017-08-21 16:13:08,111 INFO spawned: 'opentsdb' with pid 208
    2017-08-21 16:13:09,573 INFO exited: opentsdb (exit status 1; not expected)
    2017/08/21 16:13:11 200 37.461288ms POST /api/metrics/store
    2017-08-21 16:13:12,581 INFO spawned: 'opentsdb' with pid 293
    2017-08-21 16:13:14,282 INFO exited: opentsdb (exit status 1; not expected)
    2017-08-21 16:13:15,284 INFO gave up: opentsdb entered FATAL state, too many start retries too quickly

    Really appreciate any assistance.

    Regards,



    ------------------------------
    Louis Henn
    Senior IT Administrator
    Cape Town
    ------------------------------


  • 2.  RE: Health Checks for opentsdb & Central Query failing

    Posted 10-13-2017 05:03 PM
    Hi,

    It is really important to have opentsdb and central query to pass all health checks. You can try restarting them and if does not fix, please restart all the zenoss services.

    Let me know.

    ------------------------------
    Irshad.
    ------------------------------



  • 3.  RE: Health Checks for opentsdb & Central Query failing

    Posted 09-05-2018 03:51 AM
    Hello,

    I currently have a similar issue where the health checks for opentsdb (reader and writer) and CentralQuery are failing.
    Were you able to solve your issue ?

    Kind regards,

    Laurent

    ------------------------------
    Laurent Hemeryck
    Monitoring Engineer
    FedNot
    ------------------------------