A Lazy Sequence

CoffeeScript comprehensions are broken

12 September 2011

I have recently been using a small amount of CoffeeScript at work and evaluating its merit in my web development toolbox. This post is about a particular feature in CoffeeScript that is poorly considered.

CoffeeScript provides notation for comprehensions; A comprehension allows the programmer to concisely express operations over one or more sets of items. While Javascript programmers do not currently have a syntactic mechanism for this, it is part of the Harmony project.

First up lets examine a comprehension in CoffeeScript. The basic notion is that a comprehension is a for loop as an expression. For example:

l = x * 2 for x in [0..5]
// => [0, 2, 4, 6, 8, 10]

or alternatively:

l = for x in [0..5]
        x * 2
// => [0, 2, 4, 6, 8, 10]

This is clearly a straight forward mapping. We can also add filtering clause e.g.:

l = x * 2 for x in [0..5] when x % 2
// => [2, 6, 10]

So what is the problem? The answer is two-fold: firstly, the semantics of nested comprehensions are non-optimal (even incorrect), and secondly they are strict.

Nested comprehensions

To examine this issue, I am going to compare CoffeeScript and Python for the semantics of nested comprehensions. Firstly the CoffeeScript:

l = for x in [0, 1, 2] 
        for y in ["a", "b", "c"]
            "#{x},#{y}"
// => [["0,a", "0,b", "0,c"],
//     ["1,a", "1,b", "1,c"],
//     ["2,a", "2,b", "2,c"]]

And now Python:

l = ["%s,%s"% (x,y) for x in [0,1,2] for y in ["a","b","c"]]
# => ['0,a', '0,b', '0,c', 
#     '1,a', '1,b', '1,c', 
#     '2,a', '2,b', '2,c']

This illustrates the difference quite clearly. In Python, a language with real comprehensions, the comprehension results in a single list that is the cross product of both source lists. In CoffeeScript, we get a list of lists of objects.

To illustrate that the Python version is in fact more expressive, the following Python generates the same shaped result as the CoffeeScript version:

l = [["%s,%s" % (x,y) for y in ["a","b","c"]] for x in [0,1,2]]
# => [['0,a', '0,b', '0,c'], 
#     ['1,a', '1,b', '1,c'], 
#     ['2,a', '2,b', '2,c']]

I refer to the CoffeeScript behaviour as ‘map’ oriented comprehensions, and the Python behaviour as ‘mapcat’ oriented comprehensionsi. The mapcat model is more compositional than the map model. This means that the result of a function that is defined as mapcat oriented comprehension can be the input to itself recursively. For exampleii:

def flatten(l):
    try:
       return [y for x in l for y in flatten(x)]
    except TypeError, e: # horrible test for iterability
       return [l]

flatten(1)
# => [1]
flatten([1,2])
# => [1, 2]
flatten([[[1,2], 3], [[4]]])
# => [1, 2, 3, 4]

Strictness

Strictness is the property of a program to be evaluated in its entirety as soon as possible, in contrast to laziness which delays as much computation as long as possible. Most languages exist somewhere on the spectrum of fully strict to mostly lazy. With regard to comprehensions, a strict comprehension takes one or more lists and creates a new list. A lazy comprehension takes a stream of values and returns a stream of values, computing them by need.

The underlying mechanism of this laziness in an otherwise strict programming language is typically something resembling an iterator: You have an object with a current value and method of retrieving the remaining values. One of the advantages of this is that you can perform computations on an infinite stream. E.g. in Pythoniii, taking the first 10 items for a stream:

import itertools
i = itertools.islice(itertools.cycle("abc"), 10)
# i => <itertools.islice object at 0x1004c1f70>
list(i) 
# use a list to realise the iterator and produce a print friendly 
# representation
# => ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a']

In the example above, itertools.cycle produces an infinite stream of “a”,”b”, and “c” cycling. If you attempted to realize it completely you would be waiting forever.

We can use then these iterators in a generator comprehension. For example, the following code implements a primitive cipher function, where the key characters are cycled for the length of the message using itertools.cycle:

import itertools
def cipher(message, key):
    return ''.join(chr(ord(m) ^ ord(k)) for (m, k) in 
                        itertools.izip(message, itertools.cycle(key)))

cipher("hello, world", "abc")
# => '\t\x07\x0f\r\rOA\x15\x0c\x13\x0e\x07'
cipher('\t\x07\x0f\r\rOA\x15\x0c\x13\x0e\x07', "abc")
# => 'hello, world'

This is a trivial example, and it could simply be replaced with an explicit loop doing some book keeping about the position into the key, yet it illustrated nicely how infinite streams can aid expressiveness.

Harmony

Both these issues are not only poor in isolation, but they are also at odds with the future of Javascript. The Harmony projectiv will introduce both generators and comprehensions to Javascript, and provide different semantics for both. There are three possibilities I see:

  • Break backwards compatibility for existing CoffeeScript code to support Harmony comprehensions and generators.
  • Add some extra syntax to support new modes (e.g. supporting {map, mapcat}✕{lazy, strict}).
  • Ignore Harmony’s comprehensions and generator features. This seems unlikely given the kind of developer who chooses CoffeeScript.

jQuery addendum

For completeness, jQuery provides an implementation of map and mapcat that provides all you need to macgyver together your own comprehensions with just a couple of functions and some rugous syntax. For example, you could implement flatten (see above) as follows:

function flatten(l) {
    if (!(l instanceof Array)) return [l];
    return $.map(l, flatten);
}

Due to the quirks of how jQuery’s map is actually map or mapcat depending on the return type of the function v, we can actually implement it as follows:

function flatten(l) {
    return l instanceof Array ? $.map(l, flatten) : l;
}

See Also

Footnotes

  1. Named for the Lisp function you would use to implement each style. Mapcat is simply a concatenate operation applied to the result of a mapping operation.
  2. Incidentally this example shows an awkward feature of Python’s comprehension syntax.
  3. Python programmers will point out that their language has both lazy and strict comprehensions.
  4. Brenden Eich’s Twitter Remix JS talk, check out the approved for ES.next slides.
  5. If the return type is an Array, it acts like mapcat, otherwise it acts like map. Weirdo, but handy.

Comprehensions continued

13 September 2011

In response to my post on CoffeeScript’s comprehension, Trevor Burnham — author of The CoffeeScript Bookexplained the rational behind CoffeeScript’s choices:

I think the objections here would be moot if CoffeeScript’s “list comprehensions” were called something else. They don’t behave like list comprehensions in Python, but their behavior is perfectly consistent with the CoffeeScript philosophy that every expression has a value, and loops are expressions. That means that a single loop has a single list as its value, and a loop within a loop produces a list of lists. Note that unlike Python—or Harmony—there’s no special delimiter to say “Give me a list comprehension.” It’s just the natural implication of how expressions in CoffeeScript work.