Compile Time Aggregate Introspection in C++20 - part 1: Counting Fields#

Recently, there have been a number of “reflection” libraries in C++20 that purport to be pure, standard C++. I’ve worked with and on various attempts at compile time reflection in C++ through the years and have never really come up with anythig fully satisfactory. It’s done the work, but the syntax the user has to use has never really been what I’d call “natuaral”. Now it has been accomplished in a pretty clean, approachable way by a few different libraries.

So far I’ve seen introspection only. You can feed a type to a function, and so long as it meets certain conditions, you can iterate or query its fields and receive their names and types. You can set and manipulate values. What you can’t yet do is construct a new type that looks like a natural C++ class; in this regard you are still stuck manipulating static maps or something.

Some examples:

reflect-cpp - fulfilling the role of a general purpose reflection library that serializes in a few formats.
glaze - uses internal constructs that implement reflection to serialize in JSON in a fast way.

Well, I want to know how they are doing it and so I’ve been studying the constructs. One great way to learn something deeply is to try and explain it and so that’s what I’m going to do.

Fist step: Counting Fields#

The first thing you need to know about a type before you can gather its fields is to know exactly how many fields it has. With aggregate types this can be done fairly easily, with a minor wrench tossed it to make it fun. With non-aggregates you won’t get the fields using this technique, but only its construction parameters. Furthermore, you wouldn’t get a clean count of parameters because constructors can have several overloads and can have overloads with the same number of parameters but with different types. So a type that can’t be initialized with aggregate syntax is probably not possible.

What’s an aggregate type?#

So a reminder of what aggregates are and how they can be initialized is warranted. Aggregates in C++ are types that (as of C++20, because it changed):

Have no user-declared constructors, not even defaulted (this was possible in C++17)
No private or protected direct data members.
No virtual base classes
No protected or privade direct base classes.

So then, with the following base class definitions:

struct base0
{
    base0(int i) : x{i} {}
protected:
    int x = 5;
};


struct base1 : base0
{
    int y = 42;
};


struct base2 : protected base0
{
    base2(int i) : base0{i} {}
    int y = 66;
};

These classe are aggregates:

// base0 is not an aggregate because of protected member, but this class has no
// private or protected members of its own.
// base0 also has a user declared constructor, but this class does not.
struct derived0 : base0
{
    int a = 1;
};

// base1 is an aggregate and this class doesn't change that.
struct derived1 : base1
{
    int a = 1;
};

// base2 has a protected base, but this class has no direct protected bases.
// it's an aggregate.
struct derived2 : base2
{
    int a = 1;
};

While these are not:

// has a direct protected base.  is not aggregate.
struct derived3 : protected base2
{
    int a = 1;
};

// has a protected member.  is not an aggregate.
struct derived4 : base0
{
protected:
    int a = 1;
};

// has a user defined constructor.  is not an aggregate.
struct derived5
{
    derived5() = default;
    int a = 1;
};

Furthermore, while the rules for aggregate classes apply to themselves, the members of an aggregate class do not themselves have to be aggregates. So this class is an aggregate even though std::string has user-defined constuctors, private members, and may even have private bases.

struct derp
{
    std::string herp;
};

Their members can even have default initializers.

You can check this all on godbolt.org.

Arrays are also aggregates.

Note

With earlier versions of C++ these are not the rules. For example, this class is still an aggregate type in C++17 because it has no user provided constructor (C++20 says user declared):

struct derp
{
    derp() = default;
    int x;
};

So what?#

The interesting thing for us is that with aggregate types you can use aggregate initialization. This is with or without designated initializers, which though they are good practice we won’t use here.

For a class definition of:

// With this class definition:
struct outer
{
    int x;
    struct
    {
        int y;
        int z;
    } inner0;
    int inner1[2];
    struct
    {
        int a;
        int b;
    } inner2;
};

// We can do any of the following:
outer a{0,{1,2},{3,4},{5,6}};
outer b{0,{1,2}}; // results in {0,{1,2},{0,0},{0,0}}
outer c{0,1,2,3,4,5,6}; // same result as a
outer d{}; // initialize it all to 0

// And also this:
outer e = {}; // or any of the other variants.

Getting a field count#

Now that we’ve discussed what an aggregate type is and how that affects the way it is initialized we can build a field counting function fairly easily. Since we don’t know what the field types are yet, we need a generic type that is implicitly convertible to anything. This is actually impossible because we don’t know how to create a variable of any given type–it may or may not be default constructible, copyable, movable, etc… These are not requirements for membership in an aggregate. However, since this will only ever be used in the context of a constant expression we can depend on the fact that the conversion function won’t actually be used and so doesn’t need a body, much like std::declval:

struct any_type
{
    template < typename T >
    constexpr operator T () noexcept;
};

With this definition we then just check whether a given type can be aggregate initialized with a list of any_type of an incrementing length until we fail. We will use C++20 concepts for this:

template < typename T, typename ... Args >
    requires(std::is_aggregate_v<T>)
consteval std::size_t field_count()
{
    if constexpr (requires { T{Args..., any_type{}}; })
        return field_count<T, Args..., any_type>();
    return sizeof...(Args);
}

And we can test it with these types:

struct agg
{
    int x;
    int y;
};

struct aggsubagg
{
    int x;
    agg y;
    agg z;
};

And see that it works:

static_assert(field_count<agg>() == 2);
static_assert(field_count<aggsubagg>() == 3);

You might expect that the field_count function would return something like 5 for aggsubagg because aggsubagg asa = {0,1,2,3,4,5}; is totally valid. It doesn’t do this though because we are using our any_type that is convertible to both int and to agg. Thus the compiler chooses to initialize each agg in aggsubagg with exactly 1 any_type argument. The expression aggsubagg asa = {any_type{}, any_type{}, any_type{}, any_type{}, any_type{}}; is illegal for this reason and you’ll get an error saying there are too many initializers.

A problem with this definition#

This works great until we use regular arrays in our aggregate type. Regular arrays are themselves aggregate types so you can use the same brace ommission with them as with any other subaggregate. However, in the case of array the any_type is not convertible to the target type. Consider the following attempt:

int x[2] = any_type{};

This results in an error something like: error: array must be initialized with a brace-enclosed initializer

This is the same result as if you tried this code that uses a value of the same type:

int x[2] = {};
int y[2] = x;

Even though C++ can pass and return arrays by value, you can’t initialize an array with a copy constructor.

So then this aggregate type is going to result in the wrong field count:

struct agg
{
    int x;
    int y[2];
    int z;
};

Here field_count will return 4 rather than 3. This is because {any_type{}, any_type{}, any_type{}, any_type{}} is valid using brace ommission and the conversion that takes place with other kinds of subaggregate doesn’t happen here.

There are two approaches you could take to solving this problem:

Tell everyone to use a structured object, like std::array, rather than using raw arrays. There are great arguments for doing so anyway, and so the fact that this doesn’t work with raw arrays is kind of a whatever.
Or we can fix it.

Fixing for arrays#

There are two libraries I am referencing in learning how this stuff is done: reflect_cpp and glaze. The reflect_cpp method seems a bit overcomplex. It first does an initial field_count call to get a maximum number of possible fields. It then iterates over that range and greedily tries to initialize within sub-initializers–meaning a pair of {} within the outer {}. It then counts each maximum as 1. The glaze library does it simpler by using the fact that anything can be initialized with a sub initializer. In other words:

struct agg
{
    int x;
    int y;
};

agg a{{0},{1}};
// works the same as:
agg b{0, 1};

A simple change to the original function fixes it:

template < typename T, typename ... Args >
    requires(std::is_aggregate_v<T>)
consteval std::size_t field_count()
{
    // the following check is the only change, note the {} around each argument
    if constexpr (requires { T{{Args{}}..., {any_type{}}}; })
        return field_count<T, Args..., any_type>();
    return sizeof...(Args);
}

And now this will work:

struct agg
{
    int x;
    int y[2];
    int z[2];
};

static_assert(field_count<agg>() == 3);

Still broken#

What if a member isn’t default constructible? Well in that case you can’t partially initialize your aggregate without an initializer for that member.

struct inner
{
    inner(int, int) {}
};

struct outer
{
    int x;
    int y;
    inner z;
};

outer a{}; // fails to compile
outer b{0,1}; // fails to compile
outer c{0,1,2, 3}; // succeeds
outer d{ .z = inner{0,1} }; // succeeds

If we use our field_count implementation on this type we’ll get an answer of 0. This is because the very first check is going to test to see if outer{{any_type{}}} is a valid expression, find that it isn’t, and return the empty pack result. To fix this we need to use the check from reflect_cpp instead of glaze:

template < typename T, typename ... Args >
    requires(std::is_aggregate_v<T>)
consteval std::size_t field_count()
{
    constexpr auto curr = requires { T{{Args{}}...}; };
    constexpr auto next = requires { T{{Args{}}..., {any_type{}}}; };

    if constexpr (curr && !next)
        return sizeof...(Args);
    else
        return field_count<T, Args..., any_type>();
}

One might be tempted here to omit the else statement like you’d normally do when you return one value in one side of the branch but a different value in the other side of the branch. You can’t do that though because otherwise the negative case is not discarded via. the constexpr if and, even though its unreachable, the compiler has to attempt to instantiate that branch. This cascades to infinity. If it’s put into an else alternative then that case is discarded when the if expression is true.

Code availability#

The final result of the coding in this blog is available in our Gitlab It includes a set of tests for this code and for code in the reflect_cpp and glaze libraries that implement the same functionality but in different ways.

01 April 2024

Recent Posts

Categories

Archives

Tags

Compile Time Aggregate Introspection in C++20 - part 1: Counting Fields#

Fist step: Counting Fields#

What’s an aggregate type?#

So what?#

Getting a field count#

A problem with this definition#

Fixing for arrays#

Still broken#

Code availability#