Some leaky abstractions in C++11
Tags: programming, musings
Thanks to the introductory course in computer science taught by my colleague Filip
Sadlo, I am starting to reacquaint
myself more with the thought processes of beginners who learn C++ for the first time. Programming is
all about abstraction. Finding out how to solve problems generally and generically instead of
providing a bespoke solution for a single problem. This is certainly the intention of the Standard
Template Library. As students, we all learned to use std::sort
, std::transform
, std::for_each
,
and so on. With the arrival of C++11, algorithms based on functors or predicates became even more
useful. Instead of having to specify a function object beforehand, I can simply use an in-line
lambda expression as the input of, say, std::remove_copy_if
.
That’s all very well, but unfortunately, the STL and C++11 is somewhat “leaky”at times. Joel Spolsky coined the term ldquo;leaky abstraction” to refer to abstractions that, often inadvertently, break down and expose knowledge about the implementation to the user. During my—for want of a better word—“career”, I encountered multiple such leaks in the STL and C++. I want to discuss two interesting ones in detail here, because they tend to confuse beginners the most.
How to sort a list
Having seen the horrors of qsort()
in the C library, you are glad that C++ offers std::sort
. The
following works great:
std::array<double, 3> A = { { 3.0, 2.0, 1.0 } };
std::sort( A.begin(), A.end() );
This works likewise:
std::vector<double> B = { 3.0, 2.0, 1.0 };
std::sort( B.begin(), B.end() );
You start feeling confident. Obviously, you have figured that one out. So:
std::list<double> C = { 3.0, 2.0, 1.0 };
std::sort( C.begin(), C.end() );
But the compiler denies you this small victory. Instead of the compiled code, you get pages of error messages. The salient points seems to be:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/algorithm:62:
/usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/stl_algo.h:1882:18: error: invalid operands to binary expression ('std::_List_iterator<double>' and
'std::_List_iterator<double>')
if (__last - __first > int(_S_threshold))
This is the way of the STL to tell you that the iterators of a list do not support subtraction. They are not random access iterators. The solution is to use the built-in sorting function for lists:
std::list<double> C = { 3.0, 2.0, 1.0 };
C.sort();
This feels cheap, does it not? Vector, for example, does not have this built-in function. Hence, you need to check the container type before applying a sort function to it. But this is completely contrary to the intention of the STL, which wants you to prefer generic algorithms over specific implementations.
Of course, this behaviour is well-documented and can easily be justified. It’s confusing for the beginner, nonetheless.
Copying functions
C++11 added the great std::function
class, which permits us programmers to write less verbose
code. For example, I often use a construction along the following lines:
std::function< double ( double ) > square = [] ( double a ) { return a*a; };
std::function< double ( double ) > root = [] ( double a ) { return std::sqrt(a); };
// ...
std::function< double ( double ) > f = square;
f( 2.0 );
f( 4.0 );
// ...
f = root;
f( 2.0 );
f( 4.0 );
The idea is to be able to reassign f
later on in order to change the behaviour without writing new
code. So far, so good. But how could we make this less verbose? By using auto, of course. This
lets the compiler figure out the proper function signature:
auto square = [] ( double a ) { return a*a; };
auto root = [] ( double a ) { return std::sqrt(a); };
// ...
auto f = square;
f( 2.0 );
f( 4.0 );
// ...
f = root;
f( 2.0 );
f( 4.0 );
But, woe is us! Another compiler error:
f.cc:14:5: error: no viable overloaded '='
f = root;
What happened? Well, we were being sneaky. The syntax we used actually refers to a lambda
expression. In order to be able to create more compact and efficient code, whenever we specify it
with auto
, the type of the expression will be a special compiler-generated type. std::function
,
on the other hand, is more heavy-weight—it will typically incur some small overhead cost for
calling. So, again, the compiler behaviour is justified (as always), but it certainly makes for
some confusion of the sort “Why am I suddenly losing the ability to reassign function objects
when using auto
?”.
These are just two examples that I recently encountered. I am sure there are tons and tons more. I am not sure what would be better, from a pedagogical point of view:
- Teaching these “leaks” along with the regular material
- Waiting for students to encounter them on their own and be confused.
So far, I am using the second method. I try to justify this by the assumption that students learn more that way…