Photo of the main post office in Dublin when I visited years ago

Not-ignoring daylight savings time for analysis

Mar 11, 2025

No one told me having a kid meant I have to wake up for school every day too. >< Also. TIL that if I adjust the scheduled publish date in Ghost, it reset the time so this one got sent out late =\ dang it.

Oh, hey, daylight savings time has just hit the US and everyone is slightly more tired than usual and all our meetings with folks in other countries have potentially jumped to a different hour depending on how we set up the meetings. And then we might do it again as other countries start adopting their version of summer time too in the coming weeks. Great fun.

One of the most solid pieces of advice that we give data scientists who are new to the field is to completely avoid having to deal with daylight savings times and time zones altogether because it is impossible to make sense of shifting time regimes without a lot of madness-inducing care. Just stick to unixtime, UTC, or just about anything else that does not include a shifting time scale. Keeping things monotonic and continuous is important for a lot of our analyses, and so we like to make sure things stay that way.

But, unfortunately, not everything can ignore the realities of local time.

Local time is very useful when trying to understand humans. That totally makes sense since local time was invented by humans ages ago based on the movements of the sun. We had to invent modern coordinated time only when we started running trains on shared rails and learned our trains would crash if we didn't coordinate the clocks of distant regions. Things got even more intense once we were able to communicate with places far enough away that time differences mattered.

Local time is human time

There's a surprising lack of information in the sentence "This user clicked on the button at 4pm UTC." If that user is in the Eastern US time zone, it would be around noon for them. But if the user was India, it'd be late in the evening, and it'd be very late at night for someone in China. Depending on what kind of question you are trying to answer, that information might be extremely important.

Back when I worked at Meetup, I'd occasionally have to do research on what users were doing on the platform, and the primary activity was getting together for various events. At scale, looking at local time distribution of events told fascinating stories. During the weekdays, most groups would host their events in the evening after work hours, but times would be much more spread out during the weekends. Except, you'd find groups of stay-at-home moms and dads groups that would host their events during the day for arranging playdates for their kids. Tech talk groups would meet right after work, but dating and socializing groups would typically meet later. Photograph groups would do their photo walks during the day, often on weekends.

These patterns held true largely across geographies and cultures because they're reflections of the rhythms of modern adult life. We work, we eat, we sleep, and we enjoy ourselves in the gaps between.

You'll also see similar patterns of behaviors with things like internet traffic, electricity and water usage rising during the local daytime and eventually dropping off when people go to sleep. It's everywhere and we all take it largely for granted... until we have to analyze it and things get broken by discontinuities.

Dealing with daylight savings changes

So when we deal with daylight savings changes to local timestamps, we usually see one of two things happen: either we get zero events for an hour because the clocks jumped forward an hour, or we get about double the events for an hour because an hour has been repeated. The DST rules for the US are such that this happens at the strike of 2AM, it is this local time that you have to be the most careful about.

It's important to note here that there's no clever trick to getting these discontinuities to go away. Even if you try to do something like do all datetime processing in UTC in an effort to keep the saner time scale in use until the very last moment, the instant you convert that timestamp to the local time zone, you'll reintroduce the discontinuities of DST. The mapping of time stamps just works out that way no matter how hard you try.

So what's an overworked analyst to do?

First, question whether you need to work at a (sub)-hourly time scale that would show such an issue. That is to say , stick to daily or larger durations if you can, or at least muti-hour windows. That would allow you to just ignore DST altogether since the activity volume you care about is likely there in the range of values.

If ignoring the issue isn't an option, then you can consider fully embracing it. That often means plotting and reporting the data "as-is" and leaving an asterisk at the anomalous parts saying that daylight savings time has caused a data artifact. Since most people are at least somewhat familiar with daylight savings time, it's not too big of a stretch for them to understand and work around the quirk in the data. The danger here though is that you need to be careful about things unexpectedly breaking, like for example if you have a hour-on-hour comparison, things become really broken during the changeover hours. By itself it's not difficult to spot a bunch of NaN or oddly high/low numbers, but these issues are so easy to forget and then cause people reading reports to start doubting whether the numbers are correct.

The final strategy is to treat these things like missing/erroneous data (which they are a special case of). That means finding an acceptable way for you to repair the broken data in a way that is acceptable to your analysis. For example, if it were really important, you could go back and manually convert the timestamps to values as if DST had't come into effect. It'll mess up the timescale of any analysis you make since now you've become disconnected from 'clock time', but you might want to take the compromise that all your analysis is now done in "hours after midnight". Other strategies include using moving averages to smooth things out somewhat, though you need pretty huge windows to do so and it might not be worth it.

Just make sure that whatever you do, you do your timezone conversions using the proper library for your language of choice. The tz database is probably THE canonical source of truth for how computers can understand time zones, including extremely detailed and obscure historical time zone changes that might affect some places. It's not as simple as adding/subtracting an offset value per region.


Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.


About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

  • randyau.com — Curated archive of evergreen posts. Under re-construction thanks to *waves at everything

Supporting the newsletter

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

  • Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions
  • Send a one time tip (feel free to change the amount)
  • Share posts you like with other people!
  • Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
  • Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!