Python str Collection Gotchas

Published on Friday, February 24, 2023
Tags: python

We have been slowly adding Python type hints [1] to Synapse and have made great progress (see some of our motivation). Through this process we have learned a lot about Python and type hints. One bit that was unexpected is that many of the abstract base classes representing groups of str instances also match an individual str instance. This has resulted in more than one real bug for us [2]: a function which has parameter of type Collection[str] was called with a str, for example [3]:

def send(event: Event, destinations: Collection[str]) -> None:
    """Send an event to a set of destinations."""
    for destination in destinations:
        # Do some HTTP.
        ...

def create_event(sender: str, content: str, room: Room) -> None:
    """Create & send an event."""
    event = Event(sender, content)
    send(event, "matrix.org")

The correct version should call send with a list of destinations instead of a single one. The “s” at the end of “destinations” takes on quite a bit of importance! See the fix:

@@ -7,5 +7,5 @@
   def create_event(sender: str, content: str, room: Room) -> None:
       """Create & send an event."""
       event = Event(sender, content)
-      send(event, "matrix.org")
+      send(event, ["matrix.org"])

A possible solution is redefine the destinations parameter as a List[str], but this forces the caller to convert a set or tuple to a list (meaning iterating, allocate memory, etc.) or maybe using a cast(...) (and thus losing some of the protections from type hints). As a team we have a desire to keep the type hints of function parameters as broad as possible.

Put another way, str is an instance of Collection[str], Container[str], Iterable[str], and Sequence[str]. [4] [5]

Since our type hints are only used internally we do not need to worry too much about accepting exotic types and eventually came up with StrCollection:

# Collection[str] that does not include str itself; str being a Sequence[str]
# is very misleading and results in bugs.
StrCollection = Union[Tuple[str, ...], List[str], AbstractSet[str]]

This covers lists, tuples, sets, and frozen sets of str, the one case it does not seem to cover well is if you are using a dictionary as an Iterable[str], the easy workaround there is to call keys() on it to explicitly convert to a KeysView, which inherits from Set.

[1]Looking at the commits to mypy.ini is probably the best way to see progress.
[2]matrix-org/synapse#14809 is our tracking issue for fixing this, although matrix-org/synapse#14880 shows a concrete bug fix which occurred.
[3]This is heavily simplified, but hopefully illustrates the bug!
[4]And Reversible[str], but I don’t think I have ever seen that used and I think it less likely to introduce a bug.
[5]bytes doesn’t have quite the same issue, but it might be surprising that bytes is an instance of Collection[int], Container[int], Iterable[int], and Sequence[int]. I find this less likely to introduce a bug.