Friday 5 February 2016

Filling std::array in-place (emplace_array)

Take this local std::array:
std::array<int, 100> values;
According to the rules of C++, its elements are uninitialized (also the case for data members). If we wish to initialize values with all zeros, we can use aggregate initialization syntax:
std::array<int, 100> values = {}; // all zeros
However, to fill the array with a value other than zero would require quite a bit of typing:
std::array<int, 100> values = { 1, 1, 1, 1, … }; // keep going...
Even then, if we increase the size of the array, we must add extra values to the initializer list, or some elements will be zero-initialized.

Thankfully, there is a better solution:
std::array<int, 100> values; // uninitialized
values.fill(1); // efficient: no initial content
This is efficient for non-class types, which are uninitialized by default, but is less efficient for class types that have a well-defined value after initialization.
std::array<std::string, 100> values; // initialized to ""
values.fill("1"); // inefficient: overwrites initial content
Ideally, all elements would be initialized to "1". One option is to use std::vector instead of std::array:
constexpr auto size = 100u;
std::vector<std::string> values;
values.reserve(size);
for (size_t i = 0; i < size; ++i)
{
    values.emplace_back("1");
}
But what if we want the compile-time efficiency offered by the stack-allocated std::array? Is our only option an enormous initializer list?
std::array<std::string, 100> values = { "1", "1", "1", … }; // ...
Well, yes, but we can make the compiler do the work for us!

The solution

#include <array>
#include <cstddef>
#include <type_traits>

template<std::size_t N, typename T, typename... Ts>
struct array_emplacer
{
    template<
        typename... Args,
        typename = std::enable_if_t<(
            sizeof...(Args),
            sizeof...(Ts) + 1 < N
        )>
    >
    static std::array<T, N> emplace(Args const&... args)
    {
        return array_emplacer<N, T, T, Ts...>::emplace(args...);
    }

    template<
        typename... Args,
        typename = std::enable_if_t<(
            sizeof...(Args),
            sizeof...(Ts) + 1 == N
        )>,
        typename = void
    >
    static std::array<T, N> emplace(Args const&... args)
    {
        return std::array<T, N>{ T(args...), Ts(args...)... };
    }
};

template<typename T, std::size_t N, typename... Args>
std::array<T, N> emplace_array(Args const&... args)
{
    return array_emplacer<N, T>::emplace(args...);
}
Now filling our array on initialization is easy:
auto values = emplace_array<std::string, 100>("1");
In fact, we can pass any number of arguments to use for intialization, just like functions like std::vector::emplace_back:
auto values = emplace_array<std::string, 100>(3u, '9')

How does it work?


Look at this line:
        return std::array<T, N>{ T(args...), Ts(args...)... };
We see that emplace_array is generating an initializer list by somehow creating a parameter pack, Ts, which expands to N – 1 copies of the argument, T. The parameter pack, Args, is expanded into each copy of the call to T's constructor, thereby constructing the array elements in-place.

Ts is generated using template recursion.
        return array_emplacer<N, T, T, Ts...>::emplace(args...);
Here, emplace recursively calls a version of itself, increasing the number of T's by one each time. Note that template parameter lists may only contain one template parameter pack, and emplace already has a template parameter pack, Args. Thus, Ts is accumulated by recursively defining the class template, array_emplacer. Since emplace is a static member of array_emplacer, it has access to its template parameters.

The template recursion is terminated by using std::enable_if to selectively enable a terminating version of emplace when we have exactly N copies of T. We use the comma operator to redundantly include Args in the expression to activate SFINAE (substitution only involves the function template's own arguments; N and T are arguments of array_emplacer).

Finally, thanks to return value optimization, all modern compilers will optimize away all copies of the returned std::array<T, N>, so that this:
auto values = emplace_array<std::string, 5>("1");
produces the exact same compiled code as this:
std::array<std::string, 5> values = { "1", "1", "1", "1", "1" };

Limitations


In practice this solution is less than ideal. It works fine for small values of N, but as N increases, so do our problems.

First, code bloat. Compilers may be able to optimize the aggregate initialization down to a simple loop which constructs each element of the array. On the other hand, they may not, so be wary (and perhaps use std::vector instead).

Secondly, when it comes to recursion, compilers generally have limits. My version of GCC has a default maximum template instantiation depth of 900, meaning that compilation will fail for values of N larger than 898.

Thirdly, the complexity of recursion in our template metaprogram is O(n), which means that compile times will increase linearly as the size of our array increases. This is bad news.

Coming soon: emplace_array 2.0


While the first issue may always be a problem, the other problems are soluble. In my next post, I will show how we can use more advanced TMP techniques to reduce the complexity to an acceptable O(log n).

Thursday 31 December 2015

Generic static_cast for classes

If you can static_cast the data members of one class to the corresponding data members of another class, shouldn't you be able to static_cast directly between those classes? I certainly think so, and here I show how to do exactly that.

Motivation


I have two classes that, when simplified, look something like this:
struct float2
{
    float x;
    float y;
};

struct int2
{
    int i;
    int j;
};
These are 2D coordinate classes, one a vector, and the other its integer analogue. To convert from one type to the other, without writing any helper functions, it would look like this:
auto ci = int2{ 1, 2 };
auto cf = float2{
    static_cast<float>(ci.i),
    static_cast<float>(ci.j)
};
That's a mouthful! And it gets worse with the corresponding 3D and 4D types. Wouldn't it be far nicer to be able to write this?
auto ci = int2{ 1, 2 };
auto cf = static_cast<float2>(ci);

Conversion Operator


No problem, I hear you say. Simply write a conversion operator for int2:
    explicit operator float2() const
    {
        return { static_cast<float>(i), static_cast<float>(j) };
    }
Great! Job done, right? Well, only if you don't mind int2 being dependent on float2. And I assume you want a similar conversion operator in float2 so that you can cast from float2 to int2, so I hope you don't mind cyclic dependencies.

Templates


If you are like me, and you have a whole bunch of classes like this, you will want to avoid those pesky dependencies. Sounds like a job for generic programming!
    template<typename T>
    explicit operator T() const
    {
        return T{
            static_cast<decltype(T::x)>(i),
            static_cast<decltype(T::y)>(j)
        };
    }
Well, that's hardly the most generic code; it assumes T has data members named x and y. What if I have another type, short2, which has data members i and j? Maybe we could write a constructor for float2 instead.
    template<typename T>
    float2(T const& other) :
            x(static_cast<float>(other.i)),
            y(static_cast<float>(other.j))
    {
    }
This time we assume that T has data members named i and j, so this is no good either. In fact, this is even less promising than our previous approach; at least that didn't directly reference any data member instances. What we really want is for static_cast to be able to automatically deduce the target type something like this:
    template<typename T>
    explicit operator T() const
    {
        return T{ static_cast<auto>(i), static_cast<auto>(j) };
    }
While this isn't allowed in C++, there is a solution that comes close.

auto_static_cast


GManNickG devised an ingenious auto_cast "operator", which wraps the object under conversion in a "forwarding" proxy object which converts to any type, thereby emulating automatic casting. His auto_cast is actually slightly safer than static_cast, as it disallows downcasting. We want to emulate the behaviour of static_cast, so our version will be slightly modified.
#include <utility>

template<typename T>
class auto_static_cast_proxy;

template<typename T>
auto_static_cast_proxy<T> auto_static_cast(T&&);

template<typename T>
class auto_static_cast_proxy
{
    friend auto_static_cast_proxy auto_static_cast<T>(T&&);

private:
    T&& value;

    auto_static_cast_proxy(T&& value) :
        value(std::forward<T>(value))
    {
    }

public:
    template<typename U>
    operator U() const
    {
        return static_cast<U>(std::forward<T>(value));
    }
};

template<typename T>
auto_static_cast_proxy<T> auto_static_cast(T&& value)
{
    return auto_static_cast_proxy<T>(std::forward<T>(value));
}

The Solution


Using this bit of Voodoo wizardry, let's modify our conversion operator:
    template<typename T>
    explicit operator T() const
    {
        return T{ auto_static_cast(i), auto_static_cast(j) };
    }
And there you have it. Generic static_cast for the masses. Of course, this operator will work on any constructor, not just aggregate initialization; I'm not sure whether this is a good or bad thing, but you could selectively enable the operator using SFINAE (e.g. to only support PODs) if you so wished.

The Better Solution


We can make it even easier to write generic static_cast operators for our types using a simple helper function:
template<typename T, typename... Args>
T static_cast_args(Args&&... args)
{
    return T{ auto_static_cast(std::forward<Args>(args))... };
}
Thus, our operator implementation becomes:
    template<typename T>
    explicit operator T() const
    {
        return static_cast_args<T>(i, j);
    }
Lovely.

Saturday 3 October 2015

How to get a type_index without RTTI

Have you ever wondered why the typeid operator only works when run-time type information (RTTI) is enabled?  Sure, RTTI is necessary when deducing the concrete types of polymorphic objects at runtime, but why should typeid(T) require RTTI when T is known at compile time?  Surely something as simple as converting T to unique integer ID should be possible?  Here I will show how to create your own CTTI (compile time type information) type_id function.
Note: The Boost libraries project offers a more comprehensive solution in Boost.TypeIndex, which not only provides an RTTI-less type_index, but also type names in string form (I assume they are using some compiler-specific Voodoo magic).  If you are unable or unwilling to use Boost, or are just interested for academic reasons, then continue reading!

Roll your own type_index


We want to be able to do something like this.

auto id = type_id<int>();

Our task is to produce a distinct value for each and every type.  One solution is to use a static local variable within a function template.

template<typename T>
int* type_id()
{
    static int id;
    return &id;
};

Thanks to the One Definition Rule, the linker must ensure that there is a single definition of each function template instance across all translation units, and since memory addresses must be unique, we now have a unique identifier for every possible type! This is the same principle that is used to implement the singleton pattern (though I wouldn't recommend it).

However, ideally we would like the same ID for regular, const, volatile and reference versions of a type (this is what typeid does).  We can use the relevant type traits templates to strip these off.

template<typename T>
int* type_id()
{
    using t = std::remove_cv_t<std::remove_reference_t<T>>;
    return type_id_with_cvr<t>();
};

template<typename T>
int* type_id_with_cvr()
{
    static int id;
    return &id;
};

But we aren't quite finished. Using the int* directly as a type index is less than ideal, the main problem being that arbitrary pointers cannot be compared using operator< and company. We can address this by creating a nice type-safe type_index wrapper class. And while we're at it, let's also add a specialization for std::hash so that we can use type_index objects as keys in unordered associative containers.

#include <functional>
#include <type_traits>

class type_index
{
private:
    int* id;
    type_index(int* id) : id(id) {}

public:
    bool operator==(type_index const& t) const
    { return id == t.id; }
    bool operator!=(type_index const& t) const
    { return id != t.id; }
    bool operator<(type_index const& t) const
    { return std::less<int*>()(id, t.id); }
    bool operator<=(type_index const& t) const
    { return !(t > *this); }
    bool operator>(type_index const& t) const
    { return t < *this; }
    bool operator>=(type_index const& t) const
    { return !(*this < t); }

    std::size_t hash_code() const { return std::hash<int*>()(id); }

    template<typename T>
    friend type_index type_id()
    {
        using t = std::remove_cv_t<std::remove_reference_t<T>>;
        return type_id_with_cvr<t>();
    };

    template<typename T>
    friend type_index type_id_with_cvr()
    {
        static int id;
        return &id;
    };
};

namespace std
{
    template<>
    struct hash<type_index>
    {
        std::size_t operator()(type_index const& t) const
        {
            return t.hash_code();
        }
    };
}

Now we're ready to roll! Keep in mind that the type_index for a given type will only be unique within the current program. Sharing type_index objects over shared library boundaries is not possible.
Note: If you do not have C++14, you will have to implement your own type traits helper types (remove_cv_t and remove_reference_t).  If you do not have C++11, you will have to implement your own type traits constructs from scratch (or find an existing implementation).

Tuesday 29 September 2015

The Pimpl idiom: you're doing it wrong

Ready for a surprise?  Your implementation of the Pimpl idiom is probably broken and not just a bit broken very broken.  This is because most of the examples on the internet and in books are wrong.  Wikipedia does it wrong; Herb Sutter does it wrong; Scott Meyers even does it wrong in his new book on modern C++1; .  Pretty much everyone does it wrong.

How so?


So what is the problem with Pimpl?  It comes down to how pointers work: the constness of a pointer does not affect the constness of the pointee (the data it points to).  Allow me to demonstrate:

int* p1;             // p1 is mutable;   *p1 is mutable
int* const p2;       // p2 is immutable; *p2 is mutable
int const* p3;       // p3 is mutable;   *p3 is immutable
int const* const p4; // p4 is immutable; *p4 is immutable

Apologies if I am stating the obvious here, but this is the crux of the matter; a const pointer is different than a pointer to const.  The former cannot be changed to point to something else, while the latter cannot be used to modify what it points to.  Of course, you may have a const pointer to const, but the effects of the two constnesses are still completely independent of one another.

So how does this relate to Pimpl?  Look at this member function taken from a class using the Pimpl idiom:

bool widget::is_zero() const {
    return impl->value = 0;
}

Spot the problem?  This is the canonical example of accidentally typing = when we meant to type ==, thereby modifying our value instead of performing an equality test.  Ordinarily, value would be immutable within a const member function, and the compiler would complain about it being read-only.  Unfortunately, value is not a member of widget.  Sure, the pointer impl is immutable, but this just means we cannot make it point to something else; what it points to will still be mutable.

Dangerous behaviour


That's right: by valiantly wielding one of the staple tools in the C++ developer's toolbox, we have unwittingly demolished one of the pillars of good C++ design: const-correctness.  This is particularly worrying in the modern era of concurrency, considering that const-correctness is essential to writing thread-safe software.

Ironically, it all comes down to the fact that pointers are ill-suited for implementing the Pimpl ("pointer" to implementation) idiom.  std::unique_ptr doesn't even make things better, as it has the same dereferencing behaviour as a raw pointer (and so it should).  Of course, std::unique_ptr does provide us with automatic resource management, so we would prefer to use it over a raw pointer and explicit delete.

The solution


What we need is something like std::unique_ptr, but where the constness of the pointee is tied to the constness of the pointer itself.  Luckily, C++ provides us the tools to do exactly that:

template<typename T>
class pimpl_ptr
{
private:
    T* impl;

public:
    pimpl_ptr(T* impl = nullptr) : impl(impl) {
    }

    pimpl_ptr(pimpl_ptr const&) = delete;
    pimpl_ptr& operator=(pimpl_ptr const&) = delete;

    pimpl_ptr(pimpl_ptr&& other) : impl(other.impl) {
        other.impl = nullptr;
    }

    pimpl_ptr& operator=(pimpl_ptr&& other) {
        impl = other.impl;
        other.impl = nullptr;
        return *this;
    }

    ~pimpl_ptr() {
        delete impl;
    }

    T const& operator*() const {
        return *impl;
    }

    T& operator*() {
        return *impl;
    }

    T const* operator->() const {
        return impl;
    }

    T* operator->() {
        return impl;
    }
};

Note the const overloads of the star and arrow operators which return reference/pointer to const.  This wrapper class is significantly simpler than any std::unique_ptr implementation, but given its specialized purpose, I don't think we need anything more complicated.  Simply drop it in place of std::unique_ptr and you're ready to roll.

In conclusion


Quite how such a dangerous flaw has gone largely unappreciated by the C++ community for so long, I don't know.  Of course, I am not the first to notice this issue; indeed, it seems a proposal has been submitted to the standards committee which attempts to address the problem in much the same way as I have here.  Nonetheless, I hope this post will help to raise awareness before a solution is standardized *cough*modules*cough*.


1 Effective Modern C++, Item 22