My girlfriend introduced me to Eric Wastl’s Advent of Code, a programming puzzle Advent calendar. The best advent calendar I’ve had in my adult life! (If you’re planning on playing it, this article is one big spoiler for Day 5.) I compare declarative programming puzzle solutions on readability and speed.
The puzzle for Day 5 is:
Santa needs help figuring out which strings in his text file are naughty or nice.
A nice string is one with all of the following properties:
- It contains at least three vowels (
- It contains at least one letter that appears twice in a row, like
- It does not contain the strings
xy, even if they are part of one of the other requirements.
How many strings are nice?
On the page you get more explanation and an input to use for your code, but this is the basic challenge.
I’d normally do this in Python, but I’m brushing up on my C#-fu, so that’s what I went with. I first wrote it in classic, imperative style:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
One of the reasons I love Python is comprehensions. But
.NET has LINQ (Language Integrated Query), which I one of the reasons
I love C# too. By
using System.Linq I can turn the
Main() body into
a single, declarative statement:
1 2 3 4 5 6
It sure feels good to write compact code, but is it better than the imperative version? It depends on how you evaluate it.
Imperative code always requires detailed inspection and/or non-outdated comments to understand. On the other hand, I think the gist of the declarative version is very quick to read: “Count the number of lines in the input file which satisfy … some criteria”. Compared to the imperative version, the overall purpose is much clearer. But the individual criteria still take time to read.
1 2 3
At least three vowels, a repeated character, and no naughty substrings.
The keen reader may have noticed that the first criterion could be written more
line.Count("aeiou".Contains) >= 3 (similarly with the third).
I think this is less readable, and compactness is not a goal in itself, unless
you are playing code golf. Incidentally, the method group (compact) version
also performs slightly slower.
Variable names are a form of built-in documentation. Along the same lines, it might help to introduce variables:
But readability also depends on what you know.
For the uninitiated, regular expressions are complete gibberish. But
for the experienced,
using System.Text.RegularExpressions can actually state
the criteria more clearly. Instead of fussing over the individual characters of
the line and naughty patterns, it is easier to compare the whole, undivided
line to a pattern:
1 2 3
Regular expressions are declarative on an even higher level.
But they might not perform that well, especially when using uncompiled regular expressions. And that leads us to another criterion.
Doing a quick benchmark on an input with two million strings and warmed-up caches, the uncompiled Regex version completes in 6.6 seconds, the LINQ version in 4.4 seconds, and my naive imperative version in 2.4 seconds. So much for readability. (This was using Mono, btw.)
But, the declarative style has a trick up its sleeve: Parallelism for free!
1 2 3 4 5 6 7
1.8 seconds on my quadcore!
Now, I have not optimized any of these three versions for performance, and it could of course be done faster in C# (with or without explicit threads), and most likely even faster in C or C++.
So I think the conclusion so far is what it is nothing new: Readability and performance are conflicting goals.
One final criterion is debuggability. I have loved declarative style for a long time, but always found it harder to debug than imperative style. However, an advantage of Python and C# compared to, say, Haskell, is that they are basically imperative languages with other paradigms on top. Thus, you can gradually learn declarative and functional style in the safe context of imperative programming. And cut down on the declarative-ness when you want to.
However, next up on my reading list is LINQ Secrets Revealed: Chaining and Debugging.
I did the puzzle in Python just to see how it compared. Rather than reading the length of a list made with a list comprehension, I do it in a streaming fashion by summing over a generator expression:
1 2 3 4 5
(This is the whole program, not just the body of some main function!)
At 5.3 seconds for the two million strings, this stacks up pretty well against C#, considering that Python is interpreted.
Update 2016-03-12: I added in the performance section that I used Mono.
Update 2017-12-09: Changed the post title.