Some leaky abstractions in C++11

Tags: programming, musings

Published on
« Previous post: Why I love compilers — Next post: YouCompleteMe and CMake »

Thanks to the introductory course in computer science taught by my colleague Filip Sadlo, I am starting to reacquaint myself more with the thought processes of beginners who learn C++ for the first time. Programming is all about abstraction. Finding out how to solve problems generally and generically instead of providing a bespoke solution for a single problem. This is certainly the intention of the Standard Template Library. As students, we all learned to use std::sort, std::transform, std::for_each, and so on. With the arrival of C++11, algorithms based on functors or predicates became even more useful. Instead of having to specify a function object beforehand, I can simply use an in-line lambda expression as the input of, say, std::remove_copy_if.

That’s all very well, but unfortunately, the STL and C++11 is somewhat “leaky”at times. Joel Spolsky coined the term ldquo;leaky abstraction” to refer to abstractions that, often inadvertently, break down and expose knowledge about the implementation to the user. During my—for want of a better word—“career”, I encountered multiple such leaks in the STL and C++. I want to discuss two interesting ones in detail here, because they tend to confuse beginners the most.

How to sort a list

Having seen the horrors of qsort() in the C library, you are glad that C++ offers std::sort. The following works great:

std::array<double, 3> A = { { 3.0, 2.0, 1.0 } };
std::sort( A.begin(), A.end() );

This works likewise:

std::vector<double> B = { 3.0, 2.0, 1.0 };
std::sort( B.begin(), B.end() );

You start feeling confident. Obviously, you have figured that one out. So:

std::list<double> C = { 3.0, 2.0, 1.0 };
std::sort( C.begin(), C.end() );

But the compiler denies you this small victory. Instead of the compiled code, you get pages of error messages. The salient points seems to be:

In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/algorithm:62:
/usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_algo.h:1882:18: error: invalid operands to binary expression ('std::_List_iterator<double>' and
      'std::_List_iterator<double>')
      if (__last - __first > int(_S_threshold))

This is the way of the STL to tell you that the iterators of a list do not support subtraction. They are not random access iterators. The solution is to use the built-in sorting function for lists:

std::list<double> C = { 3.0, 2.0, 1.0 };
C.sort();

This feels cheap, does it not? Vector, for example, does not have this built-in function. Hence, you need to check the container type before applying a sort function to it. But this is completely contrary to the intention of the STL, which wants you to prefer generic algorithms over specific implementations.

Of course, this behaviour is well-documented and can easily be justified. It’s confusing for the beginner, nonetheless.

Copying functions

C++11 added the great std::function class, which permits us programmers to write less verbose code. For example, I often use a construction along the following lines:

std::function< double ( double ) > square = [] ( double a ) { return a*a; };
std::function< double ( double ) > root   = [] ( double a ) { return std::sqrt(a); };

// ... 

std::function< double ( double ) > f = square;
f( 2.0 );
f( 4.0 );

// ...

f = root;
f( 2.0 );
f( 4.0 );

The idea is to be able to reassign f later on in order to change the behaviour without writing new code. So far, so good. But how could we make this less verbose? By using auto, of course. This lets the compiler figure out the proper function signature:

auto square = [] ( double a ) { return a*a; };
auto root   = [] ( double a ) { return std::sqrt(a); };

// ... 

auto f = square;
f( 2.0 );
f( 4.0 );

// ...

f = root;
f( 2.0 );
f( 4.0 );

But, woe is us! Another compiler error:

f.cc:14:5: error: no viable overloaded '='
  f = root;

What happened? Well, we were being sneaky. The syntax we used actually refers to a lambda expression. In order to be able to create more compact and efficient code, whenever we specify it with auto, the type of the expression will be a special compiler-generated type. std::function, on the other hand, is more heavy-weight—it will typically incur some small overhead cost for calling. So, again, the compiler behaviour is justified (as always), but it certainly makes for some confusion of the sort “Why am I suddenly losing the ability to reassign function objects when using auto?”.

These are just two examples that I recently encountered. I am sure there are tons and tons more. I am not sure what would be better, from a pedagogical point of view:

  1. Teaching these “leaks” along with the regular material
  2. Waiting for students to encounter them on their own and be confused.

So far, I am using the second method. I try to justify this by the assumption that students learn more that way…