Erik Ramsgaard Wognsen

Thoughts & technology

Taming Python with Dataclasses

The Stack Overflow blog recently published the article Minimizing the downsides of dynamic programming languages (such as JavaScript, Python, Ruby, PHP) from which I’ll point out:

  • “Type hints are the most obvious way to make a dynamic language more static. In effect, you get the best of both worlds: you can write dynamic code but are required to be more careful about what types you expect to get and use at any point.”

  • “Dictionaries are for unknown data.” In other words, if you know while writing the code which keys are going to be in your dictionary at runtime, you should use an interface (TypeScript), struct (Ruby), or dataclass (Python), instead of a plain dictionary/object.

This reminded me that if you’re a Python programmer, you should definitely know about dataclasses (since Python 3.7, 2018) because they do both of the above, as fields of the dataclass are specified using type annotation syntax.

For example, if you write:

1
2
3
4
5
6
7
from dataclasses import dataclass

@dataclass
class InventoryItem:
    name: str
    unit_price: float
    quantity: int = 0

Then the @dataclass decorator adds something like the following to your class at runtime.

1
2
3
4
5
6
7
8
9
10
11
12
13
def __init__(self, name: str, unit_price: float, quantity: int = 0):
    self.name = name
    self.unit_price = unit_price
    self.quantity = quantity

def __repr__(self):
    return (f'InventoryItem(name={repr(self.name)}, '
            f'unit_price={repr(self.unit_price)}, '
            f'quantity={repr(self.quantity)})')

def __eq__(self, other):
    return ((self.name, self.unit_price, self.quantity) ==
            (other.name, other.unit_price, other.quantity))

Dataclasses can do more than that, but those are the basics. Typically, the dataclass decorator itself doesn’t actually care about the types (str, float, int) — it mainly uses the type annotations as a concise syntax to specify the names of the fields. But since dataclasses are pretty useful, they become like a “gateway drug” to using and learning more about Python type annotations.

Magic

The Stack Overflow blog post then seems to backpedal with the recommendation “Be more explicit than you need to be”. After all, having the dunder methods (__init__ etc.) generated automatically is less explicit than writing them out yourself. But the blog post clarifies that “The only real things that should ever be doing metaprogramming or reflection are frameworks themselves” which makes it ok again to use dataclasses since they are part of the standard library.

I agree; “magic” stuff should mainly be done by frameworks, and only when the benefits are tangible. And I think dataclasses are so useful (and again, part of the standard library) that it’s ok to expect readers of your code to know or be willing to learn about them.

However, in combination with other slightly magical things, I was recently wondering if the code I was writing was a bit “too magic”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from dataclasses import dataclass
from rest_framework import serializers

@dataclass
class Thing:
    my_foo: int

@dataclass
class SpecialThing(Thing):
    my_bar: int

class SpecialThingSerializer(serializers.Serializer):
    MyFoo = serializers.IntegerField(source='my_foo')
    MyBar = serializers.IntegerField(source='my_bar)

    def create(self, validated_data):
        return SpecialThing(**validated_data)

This code can deserialize a JSON object such as {"MyFoo": 1, "MyBar": 2} into my (pep8 compliant) dataclass as an object with the representation SpecialThing(my_foo=1, my_bar=2). This overall journey should not be too surprising, but the individual steps to get there are not very explicit:

  • The Django REST framework serializer will produce the validated_data dict {'my_foo': 1, 'my_bar': 2} because when working in reverse (deserializing), the source field name becomes the target, so MyFoo becomes my_foo, etc.
  • This dict is **unpacked as keyword arguments to instantiate SpecialThing
  • The auto-generated SpecialThing.__init__ correctly handles both of these keyword arguments because one is inherited from the Thing dataclass.

I love how compact the code is, but I also have to acknowledge that it might not be clear to Python novices without the above explanation. But in the bigger picture I think this example is still within the limits of what is reasonable because:

  • The overall purpose of the code (deserialization) is fairly straightforward once you are in the context of Django REST framework
  • The code doesn’t use programmatically constructed variable names, so it passes the grep test, as mentioned in the Stack Overflow blogpost
  • The magic is done with the standard library and DRF, the main framework of the codebase.

In the end, you should write so it makes sense to the people who are going to read it.

Comments