Maximo Open Forum

 View Only
  • 1.  KPIs stopped running

    Posted 11-10-2021 11:42
    Sunday night, November 7th, at 2 AM was the last time my KPIs ran.  I've tried rescheduling and reloading the Cron Task Instance, but it's still not running.  Tips for troubleshooting?

    I did recently add several more KPIs, but that was on November 2nd.  They ran successfully until 2021-11-07 02:06.  Nothing (neither the new KPIs I just added nor the ones that had already existed) has run via the Cron Task since then, though a few I have manually updated.

    I have the feeling I'm going to need to create multiple Cron Task Instances and spread this out so it doesn't all run at the same time.  But if this were the cause of the problem, it doesn't make sense to me that it would have worked fine for several days and then just stopped.
    #Administration
    #EverythingMaximo
    #Reporting

    ------------------------------
    Travis Herron
    Pensacola Christian College
    ------------------------------


  • 2.  RE: KPIs stopped running

    Posted 11-10-2021 13:26
    Update:  I've added several more Cron Task Instances and moved most of the KPIs to these new instances.  They all seem to be running fine for now.  Just can't get the original OOTB KPINONREALTIME one to run.  I'm watching the Cron Task History and it's got several Starts and Stops from when I reloaded it, but it's not running.

    I've got scheduled maintenance downtime on it tomorrow  -- Maximo will get rebooted.  Maybe that'll wake it up. . .

    ------------------------------
    Travis Herron
    Pensacola Christian College
    ------------------------------



  • 3.  RE: KPIs stopped running

    Posted 11-10-2021 18:40
    Possibly related... I had a similar issue with an external system feeding data into Maximo but the CRON/Process to create new service requests from the external system. The last service request from the external system was November 7th at 2:04 AM. Maximo restarts itself every morning at 3 AM and the issue resolved itself after the scheduled restart on Tuesday November 10th. I did not change anything in Maximo so I'm guessing it was related to daylight savings time change.

    ------------------------------
    justin haley
    UC San Diego
    ------------------------------



  • 4.  RE: KPIs stopped running

    Posted 11-11-2021 09:12
    Can you query the database? If you can you should run this query and see if your task comes back in that list.

    SELECT * FROM taskscheduler WHERE lastrun>lastend

    If for some reason Maximo didn't record it successfully completed in that table, even if you reload the cron task, it won't fix it. This could be due to a couple of a reasons. The most likely is that it encountered an unhandled exception during execution so it never recorded the finish properly. I've seen this with a few email listener issues such as when a large email thread comes in the sanitization policy has a prebuilt limit of like 200kb and this would cause the email listener to stop and require a server restart. 

    There was a more abstract issue I identified a while ago (I can track down the APAR if it will help) where the cron task could complete successfully but set the lastend to a value before the laststart. It was extremely rare, but the way that MXServer.getDate() works it calculates the difference between the application server and database server every hour to manipulate the date returned so that no matter how a date in Maximo is set (either using the database server or application server) that the times should be similar. If there is a difference in that delta when it gets recalculated and the cron task was in the process of running it could cause it to show a last end before the last start and stop the cron task. The change IBM implemented was to compare the date before setting and would set it equal to the laststart if somehow the lastend was before the laststart. 

    You can update the lastend and it might cause the cron task to resume. It will depend on the issue as to whether or not the cron task will resume correctly after that (or if a restart will be required).

    ------------------------------
    Steven Shull
    IBM
    ------------------------------



  • 5.  RE: KPIs stopped running

    Posted 11-11-2021 09:41
    Nice tip!

    Sure enough, the last start is 2 AM but the last end is 1:10 AM.  Pretty sure Justin nailed it by attributing this to the time change, which would have happened at 2 AM Sunday.

    Sadly, there were others that were in that Result Set. . .so now this becomes a quest to figure out how to get alerted when things get stuck like this, and how to fix it -- better yet, make it automated & self-healing.  A back-end update to change the value of lastend seems to work to get it going again, but if & when we move to SaaS I likely wouldn't have the convenience and freedom to go make these changes.  So. . .an Escalation that runs an Automation Script that finds that same result set where lastrun>lastend, and set the lastend value to (lastrun + 1 millisecond)?  It would be weird to rely on an Escalation here, since part of the root problem here is that there are Escalations failing to run, but it might be a step in the right direction.

    ------------------------------
    Travis Herron
    Pensacola Christian College
    ------------------------------



  • 6.  RE: KPIs stopped running

    Posted 11-11-2021 10:27
    Edited by Steven Shull 11-11-2021 10:36
    These sorts of things I would suggest alert only and ideally outside of Maximo. We made a REST API call from our monitoring system to Maximo for cron tasks that we were particularly sensitive to (such as email listener) but there's no reason you couldn't do it more generically. The hard part is understanding the normal execution times of each to determine when something is abnormal. For example, we didn't generate an alert unless email listener had been "running" for 10 minutes or more. Normally it would run in less than a few seconds but sometimes it could take a minute if there were a sufficient volume of emails to process. PM WOGEN could easily take hours in some systems so you'd have to account for that. 

    There's actually a specific APAR for the daylight savings scenario: https://www.ibm.com/support/pages/apar/IJ12595. I'll see if I can track down what version that is fixed in.

    Looks like my issue was APAR IJ19778 IJ19778: CRONTASKS STOP SENDING OR RECEIVING MESSAGES FROM SEQUENTIAL JMS QUEUES which shows as fixed in 7.6.1.2 base. Are you on 7.6.1.2? APAR IJ12595 is listed as being fixed in 7.6.1.3 but I can't see how the fix for my issue wouldn't have addressed both scenarios. 

    ------------------------------
    Steven Shull
    IBM
    ------------------------------