Return value optimization in C++

Tags: programming

Published on
« Previous post: Postincrement vs. preincrement in C++ — Next post: Displaying Kindle clippings for the web »

It is very common—not only in object-oriented programming—to have functions that generate a potentially complicated object (based on the input parameters) or modify a given one without changing the original object. For example, suppose we have a class Foo that stores integers and want to construct a new Foo object for storing a given number. There are two ways for writing such a method:

// 1st way
Foo f(unsigned int n)
{
  Foo f(n+1);
  return f;
}

// Usage: Foo foo = f(42);

// 2nd way
void g(Foo& foo, unsigned int n)
{
  foo.set(n+1);
}

// Usage:
//   Foo foo;
//   g(foo, 42);

Now, to be honest, I always cringe when I see the second way implemented. It just looks unnatural—why should I modify a given object when I essentially want a new one to be stored? The way g() is implemented seems to rather clumsy. Ostensibly, this implementation is faster because it does not involve creating a new Foo instance, setting its values, and copying it into a new object. Is this really true, though? Let’s find out:

#include <iostream>

class Foo
{
public:
  Foo()
    : data(0)
  {
    std::cout << "\t" << __PRETTY_FUNCTION__ << std::endl;
  }

  Foo(unsigned int n)
    : data(n)
  {
    std::cout << "\t" << __PRETTY_FUNCTION__ << ": " << n << std::endl;
  }

  Foo( const Foo& other )
  {
    std::cout << "\t" << __PRETTY_FUNCTION__ << std::endl;
  }

  void set(unsigned int n)
  {
    std::cout << "\t" << __PRETTY_FUNCTION__ << ": " << n << std::endl;
    data = n;
  }

private:
  unsigned int data;
};

Foo f(unsigned int n)
{
  Foo f(n+1);
  return f;
}

void g(Foo& foo, unsigned int n)
{
  foo.set(n+1);
}

int main(int, char**)
{
  {
    std::cout << "f:" << std::endl;
    Foo foo = f(41);
  }

  {
    std::cout << "g: " << std::endl;
    Foo foo;
    g(foo,41);
  }

  return 0;
}

Using gcc 4.9.2, this produces the following output:

f:
        Foo::Foo(unsigned int): 42
g: 
        Foo::Foo()
        void Foo::set(unsigned int): 42

Darn. The copy constructor is not called. What manner of sorcery is this? This procedure is known as return value optimization. The compiler may, in certain situations, omit a copy operation of classes. This is one such situation. The object created within f() is a temporary and will not be required afterwards. A stupid compiler would create the object (one constructor call), return it, and call the copy operator of the second object (another constructor call). However, since the object within f() is not used afterwards, a smarter compiler can skip the whole shebang and simply create one object with the appropriate constructor call.

For the curious, we can disable this behaviour by setting the gcc flag -fno-elide-constructors:

$ g++ -fno-elide-constructors rvo.cc
$ ./a.out
f:
        Foo::Foo(unsigned int): 42
        Foo::Foo(const Foo&)
        Foo::Foo(const Foo&)
g: 
        Foo::Foo()
        void Foo::set(unsigned int): 42

Yikes! This is terrible, but at least the expected behaviour. Does this always work? As it turns out, no. It very much depends on the compiler you are using. For example, a standard example in which the return value optimization might not work is when multiple execution paths are present, like in this example:

Foo h(unsigned int n, bool more = false)
{
  if( more )
    return Foo(n+1);
  else
    return Foo(n);
}

However, my tests with gcc 4.9.2 indicate that at least this very simple example does not yet fool the compiler. Thus, I can only conclude that we should give the compiler more credit instead of assuming that a certain idiom “naturally” results in better performance (as the proponents of the second way are wont to do). In the words of Abelson and Sussman:

[…] Thus, programs must be written for people to read, and only incidentally for machines to execute.

In C++11, there is an additional optimization available in the form of move constructors (and move assignment operators, which I will not cover):

Foo( Foo&& other )
  : data(other.data)
{
  std::cout << "\t" << __PRETTY_FUNCTION__ << ": " << data << std::endl;
}

The move constructor is called for rvalue references. When disabling return value optimization, we can see that this constructor is called instead of the usual copy constructor:

$ g++ -fno-elide-constructors -std=c++11 rvo.cc
$ ./a.out
f:
Foo::Foo(unsigned int): 42
Foo::Foo(Foo&&): 42
Foo::Foo(Foo&&): 42

Thus, the moral of the story is:

  1. Do not worry about the return values.
  2. If your implementation permits it, use move constructors to achieve a better performance when the compiler is unable to use return value optimization.
  3. Measure whether a purported optimization really has an effect.

In C++17, this construction got another intriguing update. The situation described above deals with a so-called prvalue, i.e. a ‘pure’ rvalue. The standard was amended as follows:

In C++17, copy elision was made mandatory in some situations, and that required separation of prvalue expressions from the temporary objects initialized by them, resulting in the system we have today. Note that, in contrast with the C++11 scheme, prvalues are no longer moved from.

Hence, if you run this code with C++17, you will get the following output:

$ g++ -fno-elide-constructors -std=c++17 rvo.cc
f:
    Foo::Foo(unsigned int): 42
    Foo::Foo(Foo&&): 42

Even with constructor elision, you have only a single move operation, namely the one induced by creating an object in f and returning it afterwards. If you were to return the object directly, the new standard would enforce that no copy would be created. I am always amazed how C++ continues to move forward. The Times they are a-changin’…