17 Replies Latest reply on Dec 16, 2014 6:30 AM by robertcbrowning

    Time zones, DST and gaps in monitoring

    Leon Adato

      I opened a ticket about this, and ultimately it ended up as a feature request.

       

      Before I go tilting at this particular windmill, let me frame the problem, which sits in the middle of multiple pollers, time zones, and DST.

       

      1. Pollers enter data into SW using their current time as the timestamp (except for a very small number of fields that use UTC, but they are extremely limited)
      2. Thus, if you have pollers in multiple places, you effectively can’t correlate the timing of events because they will appear to have happened hours apart.
      3. Even worse, if you have pollers in location A polling devices in location B, you are that much further away from understanding the real time something happened.

        NOTE: we have exactly that. 4 pollers in EST,4 in CST, and we aren’t fastidious about assigning only CST devices to the CST pollers. Plus, we have devices in China, Puerto Rico, etc.where we have NO pollers.

      4. Then there’s DST. We noticed during the last daylight savings' time change that we suddenly had a 1 hour gap in data. Not because we stopped collecting but because everything “jumped” an hour in the future.
      5. Worse, we have a couple of alerts that are time-aware. Meaning “IF a node doesn’t update in 30 minutes, create a ticket”. Suddenly 700 systems appeared to not have updated in 1 hour.

       

      My first solutioin - setting all pollers to the same time zone – resolves most of the issues with 1-3.

       

      But it doesn’t fix the DST shift. What would solve that would be to set all the servers to a non-DST time zone like UTC.

       

      Standing in the way of that is:

      • The time setting for your primary poller can’t be more than 5 minutes off from that of the database server (per SW tech support). Our database cluster hosts multiple other applications and thus we can not change the time on that system for love, money or a loaded gun.
      • NOTHING in any of the actual SolarWinds displays (graphs, charts, etc) indicate the time zone. So (if we had an independant database server) we could move to UTC, but we would be answering "why does the chart say the error occurred at 2am" type questions until all 30,000 employees at my company had heard the answer at least 5 times each.

       

      So forewarned is forearmed. Right now there is no way to resolve this except for the kissing-your-sister level answer of picking a "real" time zone, and bracing yourself for data gaps and tickets during each daylight savings shift.

        • Re: Time zones, DST and gaps in monitoring
          rob.hock

          Thanks for the concise compilation of distributed time issues. We are moving more towards UTC for relative time, so this pain should be lessened in the future. The DST time-shift is known, and something we need to take care of before the next DST change.

          1 of 1 people found this helpful
          • Re: Time zones, DST and gaps in monitoring
            robertcbrowning

            Is this now in the mix for a future release or is there a SW recommended work-around?

            • Re: Time zones, DST and gaps in monitoring
              rharland2012

              The silence is deafening.

              • Re: Time zones, DST and gaps in monitoring
                uidzer0

                This problem still exist?  Just figured out that because of DST i must predate my "Unmanage" application by 1 hour to get it to take effect.

                • Re: Time zones, DST and gaps in monitoring
                  Leon Adato

                  For anyone who finds this in their searches, Solarwinds has JUST addressed it as part of the SAM 6.1 hotfix #1. A new version of the jobengine will (somehow - i haven't dug into it yet) allow for DST changes every year.

                   

                  I'm glad Solarwinds finally addressed the issue of DST shift (at least for SAM users. I'm presuming the same hotifx is available for NPM) and I hope that the complete move to UTC is coming soon so that multiple pollers in different timezones won't create conflicting or confusing data.

                  1 of 1 people found this helpful
                  • Re: Time zones, DST and gaps in monitoring
                    robertcbrowning

                    Having a 1 hour gap in data in Spring is one issue, but what about the overlap in Autumn/Fall. Presumably the second hour of metrics overwrites the first hour? That would make for a loss of captured data and consequential regulatory problems.

                    Corporate systems have only one database and therefore only one "standard time", but with engines scattered world-wide cross monitoring nodes world-wide. Windows platforms are typically defaulted to whatever the local time is at that data centre and polling engines are either in or out of domains & therefore AD's GPO control, but they are polling devices that are often outside their time zone.  The only viable solution that I can see is to set the local-time per node, rather than per-engine or per SQL. Obviously we need to correlate it all so it should be based on UTC but using offsets and that way DST can become irrelevant.

                    Or is that a feature request?

                      • Re: Time zones, DST and gaps in monitoring
                        Leon Adato

                        I am not sure rob.hock - can you speak to this issue, the hotfix, and what lingering issues we may see?

                          • Re: Time zones, DST and gaps in monitoring
                            rob.hock

                            Unfortunately switching all statistics to UTC is a larger undertaking than one might think. While we have made some improvements under the covers, the data-gap issue persists. The Hotfix was simply a point-solution to prevent the job scheduler service from breaking polling under certain conditions.

                              • Re: Time zones, DST and gaps in monitoring
                                Mike Lomax

                                I understand this is a huge undertaking but I think it is one that really should be considered.  When your monitored environment and supporting staff cross time zones it becomes ugly without this feature.  When you are required to generate reports from the data to deliver to the customer, it gets much uglier.  As a company that monitors nodes all around the world ans alerts to staff in several countries, if I had figured this out during the evaluation of different software options, this would have been a huge negative for Orion.  Don't get me wrong, I love SolarWinds, NPM, SAM, response to most feature requests and the Support but other companies have handled this situation and I think, to stay competitive, SolarWinds needs to step up to that plate as well.

                                 

                                What I am about to say may not match what I have said in the past (don't have the time to review that and my thoughts get updated as I learn and see more about monitoring and Orion).  These may also be things that have already been considered but I will throw my thoughts out here anyways hoping they might help:

                                 

                                First the data should all be recorded in the database using the UTC time zone but, and to robertcbrowning's point above, without DST.  That would avoid the overlap in the DB when the time rolls back in the fall as he mentioned.

                                 

                                Second, add time zone and DST fields to user and node profiles.  The node profile settings would help for conversion of inbound timestamps to the DB in the common format and the user profile settings would help to take the DB recorded timestamp and properly display it for the particular user.

                                 

                                Third, Orion then needs to understand time zones and DST.  Meaning that under-the-covers, there needs to be code to convert the timestamps from the UTC w/o DST timestamp recorded in the DB to another, with DST consideration for the output.  The reverse is also true where if a timestamp that is gathered or presented to Orion for recording into the DB, it will need to be converted to UTC w/o DST.

                                 

                                Fourth, for internal Orion generated timestamps, the first timestamp conversion code above would be used to change the display timestamps throughout all Orion, be it webpages, reports, etc., to match the logged on user's profile settings for time zone and DST.  Here I am talking about dates posted to the DB internally by Orion and not those coming in from outside sources.  Inbound timestamps will be discussed next.  Getting to this step is the most important thing on my wish list.

                                 

                                Logging timestamps from outsides sources, or inbound timestamps, in the DB as UTC w/o DST would be a bigger challenge understandably.  I would see breaking this into a couple of tasks as well.

                                 

                                Fifth, for output in Application Monitors, I would add a check-box for each output grouping which indicates the expected Statistic value is a timestamp value and require that the date and time follow something like the 24-hour syntax "04/23/2014 14:45" or 2014-04-23 14:45" which I like better and thing should be the standard for everything.  As the output value from the monitor script comes back into SAM, the timestamp is converted to UTS w/o DST using the node's TZ/DST settings and placed in the DB.  Or it errors because the timestamp does not follow the official syntax.

                                 

                                Sixth, pulling out and recording timestamps from things like traps, and syslogs (I have little experience with syslog at this point) would likely be a greater challenge.  However another topic on this forum is that of better integration of trap collection into Orion .  One of my wants for that topic is to tie the traps to nodes which then allows for the conversion of those timestamps using the node's TZ/DST settings.

                                 

                                Seventh would be to then take the properly recorded DB timestamps (in UTC w/o DST) and display them on the webpages, reports, etc. after conversion to the user's own time zone and DST.  Until this phase of displaying the inbound timestamps properly for the user's TZ/DST is accomplished maybe a note could be displayed where items covered under 5 and 6 might appear in the webpages or reports.

                                 

                                Dreaming I know but I think those of us affected by this would all be much happier if at least 1 through 4 above was accomplish in the shorter term with 5 and 6 longer term goals.

                                 

                                So did I miss anything?  Or is there more that is affected by this that we don't see or that I simply forgot about?

                          • Re: Time zones, DST and gaps in monitoring
                            smartd

                            In the past if the Poller and the Database server were not on the same time zone, the polling status page would show the database out of sync.  Is that still the case?  Since the Poller is built into the NPM install, I'm assuming the primary NPM machine would need to be on UTC, but you could run an additional Web server that is set for local time zone and graphs would plot correctly?

                            • Re: Time zones, DST and gaps in monitoring
                              ttl

                              I don't suppose recording in Epoch time and doing conversion at the user level would be useful / possible?