Friday 5 February 2016

Filling std::array in-place (emplace_array)

Take this local std::array:
std::array<int, 100> values;
According to the rules of C++, its elements are uninitialized (also the case for data members). If we wish to initialize values with all zeros, we can use aggregate initialization syntax:
std::array<int, 100> values = {}; // all zeros
However, to fill the array with a value other than zero would require quite a bit of typing:
std::array<int, 100> values = { 1, 1, 1, 1, … }; // keep going...
Even then, if we increase the size of the array, we must add extra values to the initializer list, or some elements will be zero-initialized.

Thankfully, there is a better solution:
std::array<int, 100> values; // uninitialized
values.fill(1); // efficient: no initial content
This is efficient for non-class types, which are uninitialized by default, but is less efficient for class types that have a well-defined value after initialization.
std::array<std::string, 100> values; // initialized to ""
values.fill("1"); // inefficient: overwrites initial content
Ideally, all elements would be initialized to "1". One option is to use std::vector instead of std::array:
constexpr auto size = 100u;
std::vector<std::string> values;
values.reserve(size);
for (size_t i = 0; i < size; ++i)
{
    values.emplace_back("1");
}
But what if we want the compile-time efficiency offered by the stack-allocated std::array? Is our only option an enormous initializer list?
std::array<std::string, 100> values = { "1", "1", "1", … }; // ...
Well, yes, but we can make the compiler do the work for us!

The solution

#include <array>
#include <cstddef>
#include <type_traits>

template<std::size_t N, typename T, typename... Ts>
struct array_emplacer
{
    template<
        typename... Args,
        typename = std::enable_if_t<(
            sizeof...(Args),
            sizeof...(Ts) + 1 < N
        )>
    >
    static std::array<T, N> emplace(Args const&... args)
    {
        return array_emplacer<N, T, T, Ts...>::emplace(args...);
    }

    template<
        typename... Args,
        typename = std::enable_if_t<(
            sizeof...(Args),
            sizeof...(Ts) + 1 == N
        )>,
        typename = void
    >
    static std::array<T, N> emplace(Args const&... args)
    {
        return std::array<T, N>{ T(args...), Ts(args...)... };
    }
};

template<typename T, std::size_t N, typename... Args>
std::array<T, N> emplace_array(Args const&... args)
{
    return array_emplacer<N, T>::emplace(args...);
}
Now filling our array on initialization is easy:
auto values = emplace_array<std::string, 100>("1");
In fact, we can pass any number of arguments to use for intialization, just like functions like std::vector::emplace_back:
auto values = emplace_array<std::string, 100>(3u, '9')

How does it work?


Look at this line:
        return std::array<T, N>{ T(args...), Ts(args...)... };
We see that emplace_array is generating an initializer list by somehow creating a parameter pack, Ts, which expands to N – 1 copies of the argument, T. The parameter pack, Args, is expanded into each copy of the call to T's constructor, thereby constructing the array elements in-place.

Ts is generated using template recursion.
        return array_emplacer<N, T, T, Ts...>::emplace(args...);
Here, emplace recursively calls a version of itself, increasing the number of T's by one each time. Note that template parameter lists may only contain one template parameter pack, and emplace already has a template parameter pack, Args. Thus, Ts is accumulated by recursively defining the class template, array_emplacer. Since emplace is a static member of array_emplacer, it has access to its template parameters.

The template recursion is terminated by using std::enable_if to selectively enable a terminating version of emplace when we have exactly N copies of T. We use the comma operator to redundantly include Args in the expression to activate SFINAE (substitution only involves the function template's own arguments; N and T are arguments of array_emplacer).

Finally, thanks to return value optimization, all modern compilers will optimize away all copies of the returned std::array<T, N>, so that this:
auto values = emplace_array<std::string, 5>("1");
produces the exact same compiled code as this:
std::array<std::string, 5> values = { "1", "1", "1", "1", "1" };

Limitations


In practice this solution is less than ideal. It works fine for small values of N, but as N increases, so do our problems.

First, code bloat. Compilers may be able to optimize the aggregate initialization down to a simple loop which constructs each element of the array. On the other hand, they may not, so be wary (and perhaps use std::vector instead).

Secondly, when it comes to recursion, compilers generally have limits. My version of GCC has a default maximum template instantiation depth of 900, meaning that compilation will fail for values of N larger than 898.

Thirdly, the complexity of recursion in our template metaprogram is O(n), which means that compile times will increase linearly as the size of our array increases. This is bad news.

Coming soon: emplace_array 2.0


While the first issue may always be a problem, the other problems are soluble. In my next post, I will show how we can use more advanced TMP techniques to reduce the complexity to an acceptable O(log n).

No comments:

Post a Comment