Ten Python datetime pitfalls, and what libraries are (not) doing about it
It’s no secret that the Python datetime library has its quirks. Not only are there probably more than you think; third-party libraries don’t address most of them! I created a new library to explore what a better datetime library could look like.
💬 Discuss this post on Reddit or Hacker News.
Contents
Before we start
The pitfalls
- Incompatible concepts are squeezed into one class
- Operators ignore Daylight Saving Time (DST)
- The meaning of “naïve” is inconsistent
- Non-existent datetimes pass silently
- Guessing in the face of ambiguity
- Disambiguation breaks equality
- Inconsistent equality within timezone
- Datetime inherits from date
datetime.timezone
isn’t enough for timezone support- The local timezone is DST-unaware
Takeaways
What’s a pitfall?
Two notes before we start:
- Pitfalls aren’t bugs. They’re cases where
datetime
behaves in a way that is surprising or confusing. It’s always a bit subjective whether something is a pitfall or not. - Many pitfalls exist simply because the authors couldn’t possibly anticipate all future needs. Adding big features over 20 years—without breaking compatibility—isn’t easy.
Libraries considered
With that out of the way, these are the third-party datetime libraries I’m looking at in this post:
-
arrow
— Probably the most historically popular datetime library. Its goal is to make datetime easier to use, and to add features that many people feel are missing from the standard library. -
pendulum
— The only library that rivals arrow in popularity. It has similar goals, while explicitly improving on Arrow’s handling of Daylight Saving Time (DST). -
DateType
— a library that allows type-checkers to distinguish between naïve and aware datetimes. It doesn’t change the runtime behavior ofdatetime
. -
heliclockter
— a young library that offers datetime subclasses for UTC, local, and zoned datetimes.
These libraries I’m not looking at:
-
pytz
andpython-dateutil
, which aren’t (full) datetime replacements -
delorean
,maya
, andmoment
which all appear abandoned
Now: on to the pitfalls!
1. Incompatible concepts are squeezed into one class
It’s an infamous pain point that a datetime
instance can be either naïve or aware,
and that they can’t be mixed.
In any complex codebase, it’s difficult to be sure you won’t accidentally mix them
without actually running the code.
As a result, you end up writing redundant runtime checks,
or hoping all developers diligently read the docstrings.
# Naïve or aware? No way to tell...
def plan_mission(launch_utc: datetime) -> None: ...
There’s also the question whether distinguishing aware and naïve is enough, since within the “aware” category there are actually several different kinds of datetimes. While compatible, the semantics of UTC/offset and IANA timezones are notably different when it comes to ambiguity, for example.
What’s being done about it?
-
heliclockter
has separate classes for local, zoned, and UTC datetimes. -
DateType
allows type-checkers to distinguish naïve or aware datetimes -
arrow
andpendulum
still have one class for naïve and aware.
2. Operators ignore Daylight Saving Time (DST)
Given that datetime
supports timezones with DST transitions,
you’d reasonably expect that the +/-
operators would take
them into account—but they don’t!
paris = ZoneInfo("Europe/Paris")
# On the eve of moving the clock forward
bedtime = datetime(2023, 3, 25, 22, tzinfo=paris)
wake_up = datetime(2023, 3, 26, 7, tzinfo=paris)
# It says 9 hours, but it's actually 8!
# (because we skipped directly from 2am to 3am due to DST)
sleep = wake_up - bedtime
What’s being done about it?
-
pendulum
explicitly fixes this issue -
heliclockter
,arrow
, andDateType
don’t address it
3. The meaning of “naïve” is inconsistent
In various parts of the standard library, “naïve” datetimes are interpreted differently. Ostensibly, “naïve” means “detached from the real world”, but in the datetime library it is often implicitly treated as local time. Confusingly, it is sometimes treated as UTC1, while in other places it is treated as neither!
# a naïve datetime
d = datetime(2024, 1, 1)
# here: treated as a local time
d.timestamp()
d.astimezone(UTC)
# here: assumed UTC
d.utctimetuple()
email.utils.format_datetime(d)
datetime.utcnow()
# here: neither! (error)
d >= datetime.now(UTC)
What’s being done about it?
-
While
pendulum
andarrow
do discourage using naïve datetimes, they still support the same inconsistent semantics. -
DateType
andheliclockter
don’t address this
4. Non-existent datetimes pass silently
When the clock in a timezone is set forward, a “gap” is created. For example, if DST moves the clock forward from 2am to 3am, the time 2:30am is skipped. The standard library doesn’t warn you when you create such a non-existent time. As soon as you operate on these objects, you run into problems.
# This time doesn't exist on this date
d = datetime(2023, 3, 26, 2, 30, tzinfo=paris)
# No timestamp exists, so it takes another one from the future
t = d.timestamp()
datetime.fromtimestamp(t, tz=paris) == d # False!?
What’s being done about it?
-
pendulum
replaces the current silent behavior with another: it fast-forwards to a valid time without warning. -
arrow
,DateType
andheliclockter
don’t address this issue
5. Guessing in the face of ambiguity
When the clock in a timezone is set backwards, an ambiguity is created.
For example, if DST sets the clock one hour back at 3am, the time 2:30am exists
twice: before and after the change.
The fold
attribute was introduced
to resolve these ambiguities
The problem is that there is no objective default value for fold
:
whether you want the “earlier” or “later”
option will depend on the particular context.
For backwards compatibility, the standard library defaults to 0
,
which has the effect of silently assuming that you want the earlier occurrence2.
# Guesses your intent without warning
d = datetime(2023, 10, 29, 2, 30, tzinfo=paris)
What’s being done about it?
-
pendulum
also guesses, but rather arbitrarily decides that1
is the better default3. -
arrow
,DateType
andheliclockter
don’t address the issue.
6. Disambiguation breaks equality
Even though fold
was introduced to disambiguate times,
comparisons of disambiguated times between timezones always evaluate false due to
backwards compatibility reasons.
# A properly disambiguated time...
d = datetime(2023, 10, 29, 2, 30, tzinfo=paris, fold=1)
d_utc = d.astimezone(UTC)
d_utc.timestamp() == d.timestamp() # True: same moment in time
d_utc == d # False!?
What’s being done about it?
- None of the libraries addresses this issue
7. Inconsistent equality within timezone
In a mirror image of the previous pitfall, there is a false positive
when comparing two datetimes with the exact same tzinfo
object.
In that case, they are compared by their “wall time”.
This is mostly the same except when fold
is involved…
# two times one hour apart (due to DST transition)
earlier = datetime(2023, 10, 29, 2, 30, tzinfo=paris, fold=0)
later = datetime(2023, 10, 29, 2, 30, tzinfo=paris, fold=1)
earlier.timestamp() == later.timestamp() # false, as expected
earlier == later # true!?
Remember I said exact same tzinfo
object? If you
compare with the same timezone, but you get its object from dateutil.tz
instead of ZoneInfo
, you’ll get a different result!
from dateutil import tz
later2 = later.replace(tzinfo=tz.gettz("Europe/Paris"))
earlier == later2 # now false
What’s being done about it?
- None of the libraries addresses this issue
8. Datetime inherits from date
You may be surprised to know that datetime
is a subclass of date
.
This doesn’t seem problematic at first, but it leads to odd behavior.
Most notably, the fact that date
and datetime
cannot be compared
violates basic assumptions
of how subclasses should work.
The datetime/date
inheritance is now
widely considered
to be a design flaw
in the standard library.
# Breaks on a datetime, even though it's a subclass
def is_future(d: date) -> bool:
return d > date.today()
# Some methods inherited from `date` don't make sense
datetime.today() # fun exercise: what does this return?
What’s being done about it?
-
DateType
was explicitly developed to fix this inheritance relationship at type-checking time. -
arrow
,pendulum
, andheliclockter
don’t address the issue. Their datetime classes all inherit fromdatetime
(and thus alsodate
).
9. datetime.timezone
isn’t enough for timezone support
OK—so this is maybe something you learn once and then never forget.
But it’s still confusing that datetime.timezone
is only for fixed offsets,
and you need ZoneInfo
to express real-world timezone behavior with DST transitions.
For beginners that don’t know the difference, this is an unfortunate trap.
from datetime import timezone, datetime, timedelta
from zoneinfo import ZoneInfo
# Wrong: it's a fixed offset only valid in winter!
paris_tz = timezone(timedelta(hours=1), "CET")
# Correct: accounts for all timezone changes
paris_tz = ZoneInfo("Europe/Paris")
-
Both
arrow
andpendulum
side-step this issue by specifying timezones as strings instead of requiring special class instance. -
heliclockter
andDateType
don’t address this issue
10. The local timezone is DST-unaware
Calling astimezone()
without arguments gives you the time in the local system
timezone. However, it returns it as a fixed offset (datetime.timezone
) instead of a
full timezone (ZoneInfo
) that knows about DST transitions.
In Paris, for example, astimezone()
returns a fixed offset of UTC+1
or UTC+2 (depending on whether it’s winter or summer) instead
of the full Europe/Paris
timezone.
# you think you've got the local timezone
my_tz = datetime(2023, 1, 1).astimezone().tzinfo
# but you actually only have the wintertime variant
print(my_tz) # timezone(offset=timedelta(hours=1), "CET")
datetime(2023, 7, 1, tzinfo=my_tz) # not valid for summer!
What’s being done about it?
-
pendulum
andarrow
have methods to convert to the full local timezone. -
heliclockter
has a local datetime type with the same issue, although a fix is in the works. -
DateType
doesn’t address this issue
Datetime library scorecard
Below is a summary of how the libraries address the pitfalls () or not ().
Pitfall | Arrow | Pendulum | DateType | Heliclockter |
---|---|---|---|---|
aware/naïve in one class | ||||
Operators ignore DST | ||||
Unclear “naïve” semantics | ||||
Silent non-existence | ||||
Guesses on ambiguity | ||||
Disambiguation breaks equality | ||||
Inconsistent equality within zone | ||||
datetime inherits from date | ||||
timezone isn’t enough for timezone support |
||||
DST-unaware local timezone |
Why should you care?
The pitfalls roughly fall into two categories: confusing design and surprising edge cases. Here is why you should care about both.
Confusing design
Confusing design is the larger problem, because it amplifies the biggest source of bugs: human error. While good design helps minimize the chance of mistakes, bad design introduces more opportunities for them. Looking at other languages, it’s clear that better designs are possible. Java, C#, and Rust all have distinct classes for naïve and aware datetimes (and more). We can also see that redesigns are worth the substantial effort: Java adopted Joda-Time, and JavaScript is modernizing as well. Will Python’s datetime be left behind?
Surprising edge cases
Because these pitfalls are rare, you may think they’re not worth worrying about. After all, DST transitions only represent about 0.02% of the year. While this sentiment is understandable, I’d argue that the opposite is true:
- Getting timezones right is one of the main reasons for existence of a datetime library. If it can’t do that reliably, what’s the point?
- Rare cases are the most dangerous: they are the ones you’re least likely to test, and allow bad actors to trip up your code.
- Rare is still too common for such a fundamental concept as time.
Would you run your business on
numpy
if it had a 0.02% chance of returning the wrong result? Would you accept a language in which 1 in 4000 booleans would arbitrarily be flipped? There is no reason why these pitfalls shouldn’t be corrected.
Imagining a solution
Inspired by these findings, I created a new library to explore what a better datetime library could look like. Here is how it addresses the pitfalls:
-
It has distinct classes for the most common use cases:
(note: the types have been updated since the original article)
from whenever import ( # In case you don't care about timezones Instant, # Simple localization sans DST OffsetDateTime, # Full-featured IANA timezones ZonedDateTime, # The current system timezone SystemDateTime, # 'Naive' local times without a timezone LocalDateTime, )
- Addition and subtraction take DST into account.
- Naïve is always naïve. UTC and local time have their own separate classes.
- Creating non-existent datetimes raises an exception.
-
Ambiguous datetimes must be explicitly disambiguated.
ZonedDateTime( 2023, 1, 1, tz="Europe/Paris", ) # ok: not ambiguous ZonedDateTime( 2023, 10, 29, 2, tz="Europe/Paris", ) # ERROR: ambiguous! ZonedDateTime( 2023, 10, 29, 2, tz="Europe/Paris", disambiguate="later" ) # that's better!
- Disambiguated datetimes work correctly in comparisons.
-
Aware datetimes are equal if they occur at the same moment. No exceptions.
a == b # always equivalent to: a.instant() == b.instant()
- The datetime classes don’t inherit from date.
- IANA timezones are used everywhere, no separate classes are needed.
- Local datetimes handle DST transitions correctly.
Changelog
See the git history for exact changes to this article since initial publication.
2024-02-01 18:14:00+01:00
- Clarified wording and code comments in pitfall #3.
2024-02-02 10:13:00+01:00
- Clarified wording around timezones and IANA tz database in pitfall #9, and throughout the article.
- Added reddit link
2024-02-13 08:40:00+01:00
- Clarified wording on distinguishing “aware” types in pitfall #1.
- Added note about RFC 5545 in pitfall #5.
2024-02-18 20:28:00+01:00
- Added Hacker News link
- Clarification in pitfall #4, fix code example
- Added non-emoji text to scorecard for systems that don’t support it
2024-02-18 21:10:00+01:00
- A better solution for emoji
2024-10-03 19:15:00+02:00
- Updated the types in the example code to match the current version of the library
-
In the standard library, methods like
utcnow()
are slowly being deprecated, but many UTC-assuming parts remain. ↩ -
This does coincide with RFC 5545, but this is probably coincidental. PEP495 doesn’t mention RFC 5545, and its semantics aren’t followed in other areas of the standard library. ↩
-
Interestingly, pendulum used to have an explicit
dst_rule
parameter that was silently removed in 3.0 ↩