Configuration & Administration

Expand all | Collapse all

Zenpython stops collecting after Kerberos ticket is renewed

  • 1.  Zenpython stops collecting after Kerberos ticket is renewed

    Posted 01-03-2019 11:21 AM
    Hi Zenoss Community,

    I am running Zenoss Core 4.2.5 and using WinRM to monitor 500+ Windows Servers. I have the latest Microsoft Windows, PythonCollector, and ZenPackLib ZenPacks installed. I'm using logical zenpython daemons with about 50 servers assigned to each daemon. I've had no problems with this setup for months until recently some of my servers stop collecting data at specific times of day. There are gaps in the RRD performance graphs and new data is not picked up until I restart the corresponding zenpython daemon.

    Looking at the zenpython logs, the data collection stops at the time of the Kerberos ticket renewal. It looks like the Kerberos ticket is renewed every 7-8 hours, and about 2-3 servers from each daemon stop collecting data shortly after the renewal process. Below is an example of the log entries in one of the zenpython.log files:

    2018-11-17 06:15:56,028 INFO zen.zenpython: 44 devices processed (50580 datapoints)
    2018-11-17 06:15:56,037 INFO zen.collector.scheduler: Tasks: 135 Successful_Runs: 6146 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0
    2018-11-17 06:19:50,710 ERROR zen.MicrosoftWindows: WindowsServiceLog: failed collection - (('The referenced context has expired', 786432), ('Success', 100001)) EXAMPLE-SERVER-01
    2018-11-17 06:19:53,333 ERROR zen.MicrosoftWindows: WindowsServiceLog: failed collection - (('The referenced context has expired', 786432), ('Success', 100001)) EXAMPLE-SERVER-01
    2018-11-17 06:20:56,080 INFO zen.maintenance: Performing periodic maintenance
    2018-11-17 06:20:56,080 INFO zen.zenpython: Counter eventCount, value 961826735


    I understand that this means the Kerberos ticket has expired, but this log entry is only generated for the servers that stop collecting data. This happens to about 2-3 out of 50 servers on each zenpython daemon throughout the day.

    Looking at the Kerberos credentials cache file, I notice most, if not all, tickets are set to expire at the same time. This would suggest that they are also all getting renewed around the same time. My theory is that too many servers are set to get their tickets renewed at the same time and some servers are falling through the cracks.

    Is there any way to spread the Kerberos ticket renewals? I am not 100% sure this is the problem, but it's my only theory after troubleshooting and researching for days.

    Thanks,

    Dan



    ------------------------------
    Daniel
    ------------------------------


  • 2.  RE: Zenpython stops collecting after Kerberos ticket is renewed

    Posted 01-07-2019 10:46 AM
    UPDATE: In case this helps anyone, I was able to get rid of the "referenced context has expired" log entries and get consistent data collection for 3 days straight by setting up a cron job to renew the Kerberos ticket every 4 hours.

    There is probably a bigger underlying issue here, but I have not been able to get to the root cause yet.

    ------------------------------
    Daniel
    ------------------------------



  • 3.  RE: Zenpython stops collecting after Kerberos ticket is renewed

    Posted 01-08-2019 02:47 AM
    Edited by Georg Kraler 01-08-2019 02:48 AM
    Daniel,

    we do see similar issues on Zenoss 5 to 6.2.1 (and the latest Zenpacks) within our environment (~200 Windows Servers).
    Several attempts including Zenoss Professional Services did not lead to a solution, neither a clue of the root cause ...
    That's why we find your work-around quite interesting.
    Could you please add some further details about the mentioned cron job. Especially how you renew the Kerberos ticket.

    Thanks in advance and Best Regards,

    ------------------------------
    Georg
    ------------------------------



  • 4.  RE: Zenpython stops collecting after Kerberos ticket is renewed

    Posted 01-08-2019 08:18 AM
    Hi Georg,

    You can use the klist command to view the existing Kerberos tickets and their expiration times. You need to specify your Kerberos cache file which is found, by default on Zenoss Core 4, inside /opt/zenoss/var/krb5cc

    klist -c /opt/zenoss/var/krb5cc/<your-credentials-cache-file>

    You can manually renew the Kerberos tickets using the kinit command, like so:

    kinit -R -c /opt/zenoss/var/krb5cc/<your-credentials-cache-file>

    This command on its own would not run properly inside cron, so I set up a shell script called kerberos_renewal.sh with the following contents:

    #!/bin/bash

    # location of the shell script that initializes the zenoss environment
    ZENOSS_ENV=~zenoss/.bashrc

    # print the error message passed and exit with a return code of 1 (error)
    fail() {
    echo $*
    exit 1
    }

    #
    # main script starts here
    #

    # set up the environment
    test -f ${ZENOSS_ENV} || fail "Source environment not found"
    . ${ZENOSS_ENV}

    /usr/bin/kinit -R -c /opt/zenoss/var/krb5cc/<your-credentials-cache-file>


    My crontab entry looks like this:

    # Renew Kerberos tickets every 4 hours
    0 */4 * * * /opt/zenoss/bin/kerberos_renewal.sh

    Hope this helps.

    Daniel























    ​​

    ------------------------------
    Daniel
    ------------------------------



  • 5.  RE: Zenpython stops collecting after Kerberos ticket is renewed

    Posted 01-09-2019 03:11 AM
    Great information! Many thanks, Daniel.
    Cheers,
    Jane

    ------------------------------
    Jane Curry
    Skills 1st United Kingdom
    jane.curry@skills-1st.co.uk
    ------------------------------



  • 6.  RE: Zenpython stops collecting after Kerberos ticket is renewed

    Posted 01-09-2019 04:54 PM

    Thanks Daniel. Great info.

    We are about to go to RM6.2.1 and start monitoring Windows servers, so will save tearing my hair out over it :)



    ------------------------------
    Shane, Australia
    ZenN00b
    ------------------------------