AoC 2015 Day 5: Fun with Declarative Programming

My girlfriend introduced me to Eric Wastl’s Advent of Code, a programming puzzle Advent calendar. The best advent calendar I’ve had in my adult life! (If you’re planning on playing it, this article is one big spoiler for Day 5.) I compare declarative programming puzzle solutions on readability and speed.

The puzzle for Day 5 is:

Santa needs help figuring out which strings in his text file are naughty or nice.

A nice string is one with all of the following properties:

It contains at least three vowels (aeiou only), like aei, xazegov, or aeiouaeiouaeiou.

It contains at least one letter that appears twice in a row, like xx, abcdde (dd), or aabbccdd (aa, bb, cc, or dd).

It does not contain the strings ab, cd, pq, or xy, even if they are part of one of the other requirements.

How many strings are nice?

On the page you get more explanation and an input to use for your code, but this is the basic challenge.

I’d normally do this in Python, but I’m brushing up on my C#-fu, so that’s what I went with. I first wrote it in classic, imperative style:

using System;
using System.IO;

namespace Day5
{
  class MainClass
  {
    public static void Main()
    {
      var nNice = 0;

      foreach (var line in File.ReadLines("../../input"))
      {
        if (line.Contains("ab") ||
            line.Contains("cd") ||
            line.Contains("pq") ||
            line.Contains("xy"))
        {
          continue;
        }

        var vowels = 0;
        foreach (var character in line)
        {
          if ("aeiou".Contains(character))
          {
            vowels++;
          }
        }
        if (vowels < 3)
        {
          continue;
        }

        foreach (var character in line)
        {
          if (line.Contains(new string(character, 2)))
          {
            nNice++;
            break;
          }
        }
      }
      Console.WriteLine(nNice);
    }
  }
}

One of the reasons I love Python is comprehensions. But .NET has LINQ (Language Integrated Query), which I one of the reasons I love C# too. By using System.Linq I can turn the Main() body into a single, declarative statement:

  Console.WriteLine(
    File.ReadLines("../../input")
    .Count(line =>
      line.Count(character => "aeiou".Contains(character)) >= 3 &&
      line.Any(character => line.Contains(new string(character, 2))) &&
      !(new[] { "ab", "cd", "pq", "xy" }.Any(naughty => line.Contains(naughty)))));

It sure feels good to write compact code, but is it better than the imperative version? It depends on how you evaluate it.

Readability

Imperative code always requires detailed inspection and/or non-outdated comments to understand. On the other hand, I think the gist of the declarative version is very quick to read: “Count the number of lines in the input file which satisfy … some criteria”. Compared to the imperative version, the overall purpose is much clearer. But the individual criteria still take time to read.

      line.Count(character => "aeiou".Contains(character)) >= 3 &&
      line.Any(character => line.Contains(new string(character, 2))) &&
      !(new[] { "ab", "cd", "pq", "xy" }.Any(naughty => line.Contains(naughty)))

At least three vowels, a repeated character, and no naughty substrings.

The keen reader may have noticed that the first criterion could be written more compactly as line.Count("aeiou".Contains) >= 3 (similarly with the third). I think this is less readable, and compactness is not a goal in itself, unless you are playing code golf. Incidentally, the method group (compact) version also performs slightly slower.

Variable names are a form of built-in documentation. Along the same lines, it might help to introduce variables:

  var vowels = "aeiou";
  var naughtySubstrings = new[] { "ab", "cd", "pq", "xy" };

But readability also depends on what you know. For the uninitiated, regular expressions are complete gibberish. But for the experienced, using System.Text.RegularExpressions can actually state the criteria more clearly. Instead of fussing over the individual characters of the line and naughty patterns, it is easier to compare the whole, undivided line to a pattern:

      Regex.Matches(line, "[aeiou]").Count >= 3 &&
      Regex.IsMatch(line, @"(.)\1") &&
      !Regex.IsMatch(line, "ab|cd|pq|xy")

Regular expressions are declarative on an even higher level.

But they might not perform that well, especially when using uncompiled regular expressions. And that leads us to another criterion.

Performance

Doing a quick benchmark on an input with two million strings and warmed-up caches, the uncompiled Regex version completes in 6.6 seconds, the LINQ version in 4.4 seconds, and my naive imperative version in 2.4 seconds. So much for readability. (This was using Mono, btw.)

But, the declarative style has a trick up its sleeve: Parallelism for free!

  Console.WriteLine(
    File.ReadLines("../../input")
    .AsParallel()  ///////////////////////////////// Parallelism for free!
    .Count(line =>
      line.Count(character => "aeiou".Contains(character)) >= 3 &&
      line.Any(character => line.Contains(new string(character, 2))) &&
      !(new[] { "ab", "cd", "pq", "xy" }.Any(naughty => line.Contains(naughty)))));

1.8 seconds on my quadcore!

Now, I have not optimized any of these three versions for performance, and it could of course be done faster in C# (with or without explicit threads), and most likely even faster in C or C++.

So I think the conclusion so far is what it is nothing new: Readability and performance are conflicting goals.

Debuggability

One final criterion is debuggability. I have loved declarative style for a long time, but always found it harder to debug than imperative style. However, an advantage of Python and C# compared to, say, Haskell, is that they are basically imperative languages with other paradigms on top. Thus, you can gradually learn declarative and functional style in the safe context of imperative programming. And cut down on the declarative-ness when you want to.

However, next up on my reading list is LINQ Secrets Revealed: Chaining and Debugging.

Python Postscript

I did the puzzle in Python just to see how it compared. Rather than reading the length of a list made with a list comprehension, I do it in a streaming fashion by summing over a generator expression:

import re
print sum(1 for line in open("input").readlines()
          if (len(re.findall('[aeiou]', line)) >= 3 and
              re.search(r'(.)\1', line) and
              not re.search('ab|cd|pq|xy', line)))

(This is the whole program, not just the body of some main function!)

At 5.3 seconds for the two million strings, this stacks up pretty well against C#, considering that Python is interpreted.

Update 2016-03-12: I added in the performance section that I used Mono.

Update 2017-12-09: Changed the post title.

Erik Ramsgaard Wognsen

Thoughts & technology

AoC 2015 Day 5: Fun with Declarative Programming

Readability

Performance

Debuggability

Python Postscript

Comments