The Stack Overflow blog recently published the article Minimizing the downsides of dynamic programming languages (such as JavaScript, Python, Ruby, PHP) from which I’ll point out:
“Type hints are the most obvious way to make a dynamic language more static. In effect, you get the best of both worlds: you can write dynamic code but are required to be more careful about what types you expect to get and use at any point.”
“Dictionaries are for unknown data.” In other words, if you know while writing the code which keys are going to be in your dictionary at runtime, you should use an interface (TypeScript), struct (Ruby), or dataclass (Python), instead of a plain dictionary/object.
This reminded me that if you’re a Python programmer, you should definitely know about dataclasses (since Python 3.7, 2018) because they do both of the above, as fields of the dataclass are specified using type annotation syntax.
For example, if you write:
1 2 3 4 5 6 7 |
|
Then the @dataclass
decorator adds something like the following to your class
at runtime.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Dataclasses can do more than that, but those are the basics. Typically, the dataclass decorator itself doesn’t actually care about the types (str, float, int) — it mainly uses the type annotations as a concise syntax to specify the names of the fields. But since dataclasses are pretty useful, they become like a “gateway drug” to using and learning more about Python type annotations.
Magic
The Stack Overflow blog post then seems to backpedal with the recommendation
“Be more explicit than you need to be”. After all, having the dunder methods
(__init__
etc.) generated automatically is less explicit than writing them
out yourself. But the blog post clarifies that “The only real things that
should ever be doing metaprogramming or reflection are frameworks themselves”
which makes it ok again to use dataclasses since they are part of the standard
library.
I agree; “magic” stuff should mainly be done by frameworks, and only when the benefits are tangible. And I think dataclasses are so useful (and again, part of the standard library) that it’s ok to expect readers of your code to know or be willing to learn about them.
However, in combination with other slightly magical things, I was recently wondering if the code I was writing was a bit “too magic”:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
This code can deserialize a JSON object such as {"MyFoo": 1, "MyBar": 2}
into
my (pep8 compliant) dataclass as an object with the representation
SpecialThing(my_foo=1, my_bar=2)
. This overall journey should not be too
surprising, but the individual steps to get there are not very explicit:
- The Django REST framework serializer will produce the
validated_data
dict{'my_foo': 1, 'my_bar': 2}
because when working in reverse (deserializing), thesource
field name becomes the target, soMyFoo
becomesmy_foo
, etc. - This dict is
**
unpacked as keyword arguments to instantiateSpecialThing
- The auto-generated
SpecialThing.__init__
correctly handles both of these keyword arguments because one is inherited from theThing
dataclass.
I love how compact the code is, but I also have to acknowledge that it might not be clear to Python novices without the above explanation. But in the bigger picture I think this example is still within the limits of what is reasonable because:
- The overall purpose of the code (deserialization) is fairly straightforward once you are in the context of Django REST framework
- The code doesn’t use programmatically constructed variable names, so it passes the grep test, as mentioned in the Stack Overflow blogpost
- The magic is done with the standard library and DRF, the main framework of the codebase.
In the end, you should write so it makes sense to the people who are going to read it.