Writing functional code in C++ IV – Algebraic datatypes

-

And here comes the guilt bit. I have the strong sus­pi­cion (but not cer­tainty) that what I am do­ing here can be done with tem­plates, but did­n’t take the time to do it. With that out of the way, let’s go.

Code for this post is here. Thanks to Steve Bower and Andy Sawyer for re­view­ing it.

Algebraic datatypes (discriminated unions in F#) are a pow­er­ful con­cept in func­tional pro­gram­ming. They are the main way to rep­re­sent type vari­a­tion in your pro­gram. Very roughly, where ob­ject ori­en­ta­tion uses de­riva­tion, func­tional pro­gram­ming uses al­ge­braic datatypes. An en­tire book could be writ­ten on the the­ory of this, but the goal of this post is to see how we can map them to C++ with­out loos­ing C++ness.

When talk­ing about this with C++ pro­gram­mers, they al­ways point me to boost vari­ant. That does­n’t do it for me for sev­eral rea­sons.

First of all, boost vari­ants rep­re­sent one of a fixed col­lec­tion of types. Algebraic datatypes rep­re­sent one of a fixed col­lec­tion of named types. That means that a sim­ple thing like the code be­low can­not be rep­re­sented as vari­ant:

type LivingEntity =
| Person of string  // name
| Dog of string     // name

Ok, ok maybe you could rep­re­sent it by typifing’ things us­ing boost strong type­def, but things get ugly syn­tac­ti­cally. Moreover, a lot of time the name is all you care about …

type Switch = On | Off

Are we go­ing to strong type­def for such a sim­ple thing? Oh boy. Even ac­cept­ing the syn­tac­tic weight of all this, there are other is­sues in play. Discriminated unions are used ex­ten­sively in func­tional pro­grams. So you want a nice syn­tax to work with them Something like the be­low F# syn­tax:

let print living =
    match living with
    | Person(name) -> printfn "I'm a per named %s" name
    | Dog(name)    -> printfn "I'm a dog named %s" name

Which could be made even sweeter by us­ing the function’ key­word as be­low:

let print = function
    | Person(name) -> printfn "I'm a per named %s" name
    | Dog(name)    -> printfn "I'm a dog named %s" name

In boost vari­ant, you ei­ther use the get func­tions or you write a vis­i­tor func­tion. In the first case you are prob­a­bly go­ing to write a chain of if’ state­ments or a switch’ state­ment. Both are con­fus­ing and come with plenty of syn­tac­tic weight. I don’t re­ally want to write a vis­i­tor like the one be­low for each match’ in my code. The magic is gone.

class times_two_visitor
    : public <a href="http://www.boost.org/doc/libs/1_49_0/doc/html/boost/static_visitor.html">boost::static_visitor</a><>
{
public:
    void operator()(int & i) const
    {
        i *= 2;
    }
    void operator()(std::string & str) const
    {
        str += str;
    }
};

Ok, so boost vari­ant does­n’t re­ally work for this. Remember that our over­ar­ch­ing goal was to stay close to C++. The lan­guage it­self has some­thing that comes pretty close to what we want in the form of a union, or bet­ter a tagged union. Again, the types are not named, but maybe we can work that in.

It turns out that Jared here did all the hard work. The gen­eral idea is to use macros to hide the con­struc­tion of a tagged union with meth­ods to test the type and re­turn the con­tained value.

For ex­am­ple this code:

DU_DECLARE(LivingEntity)
    DU_VALUE(Person,    string)
    DU_VALUE(Dog,       string)
DU_END

Becomes some­thing like:

struct LivingEntity {
    private:
        LivingEntity() {}
        unsigned int m_kind;
    public:
        static LivingEntity Person(const string& value) {
            LivingEntity unionValue;
            unionValue.m_kind = 19;
            unionValue.m_Person = value;
            return unionValue; }
        bool IsPerson() const {
            return m_kind == 19;
        }
        const string& GetPerson() const {
            (void)( (!!(m_kind == 19)) || (_wassert(L"m_kind == __LINE__", L"c:discriminated_union.cpp", 19), 0) );
            return m_Person; }
        string GetPerson() {
            (void)( (!!(m_kind == 19)) || (_wassert(L"m_kind == __LINE__", L"c:discriminated_union.cpp", 19), 0) );
            return m_Person; }
   private:
string m_Person;

You can see the out­line of a tagged union (i.e. m_kind) with a con­struc­tor for each type (i.e. Person) and meth­ods to test for at type and re­turn its value. You can also see the stor­age for the value (i.e. m_Per­son).

The only thing in DU_DECLARE that is dif­fer­ent from Jared’s so­lu­tion is the type­def be­low that al­lows not re­peat­ing the LivingEntity name in each DU_VALUE.

#define DU_DECLARE(name)                        \
    struct name {                               \
    private:                                    \
        typedef name unionName;                 \
        name() {}                               \
        unsigned int m_kind;                    \
    public:
#define DU_VALUE(entryName, entryType)                                                                      \
        static unionName entryName(const entryType& value) {                                                \
            unionName unionValue;                                                                           \
            unionValue.m_kind = __LINE__;                                                                   \
            unionValue.m_##entryName = value;                                                               \
            return unionValue;  }                                                                           \
        bool Is##entryName() const { return m_kind == __LINE__;}                                            \
        const entryType& Get##entryName() const { assert(m_kind == __LINE__); return m_##entryName; }       \
        entryType Get##entryName() { assert(m_kind == __LINE__); return m_##entryName; }                    \
    private:                                                                                                \
        entryType m_##entryName;                                                                            \
    public:

With all of that at your dis­posal it be­comes easy to write:

    auto entity = LivingEntity::Dog("Bob");
    DU_MATCH(entity)
        DU_CASE(Dog,   BOOST_CHECK_EQUAL(value, "Bob");)
        DU_CASE(Person,BOOST_CHECK(false);)
    DU_MATCH_END

There are some beau­ti­ful things about this. First of all, the con­struc­tion of any of such types is su­per sim­ple. You even get in­tel­lisense!

Moreover the value’ vari­able con­tains what­ever was passed in the con­struc­tor for the ob­ject. So this is se­man­ti­cally equiv­a­lent, if not syn­tac­ti­cally, to the match state­ment in F#.

Obviously the code part is not lim­ited to a sin­gle in­struc­tion:

    DU_MATCH(entity)
        DU_CASE(Dog,
            cout << "I should be here";
            BOOST_CHECK_EQUAL(value, "Bob");
        )
        DU_CASE(Person,
            BOOST_CHECK(false);
        )
    DU_MATCH_END

And for those of you ad­dicted to braces, I ven­ture:

    DU_MATCH(entity)
        DU_CASE(Dog,
        {
            cout << "I should be here";
            BOOST_CHECK_EQUAL(value, "Bob");
        })
        DU_CASE(Person,
        {
            BOOST_CHECK(false);
        })
    DU_MATCH_END

They all work with the same macro de­f­i­n­i­tion. They ex­pand to some­thing along the line of:

    if(false) {}
        else if(entity.IsDog()) {
            auto value = entity.GetDog();
            BOOST_CHECK_EQUAL(value, "Bob");
        }
        else if(entity.IsPerson()) {
            auto value = entity.GetPerson();
            BOOST_CHECK(false);
        }
        else {
            throw match_exception();
        }

I’ve not reached the pin­na­cle of macro nam­ing mas­ter­ing with this one. Making them low­er­case and risk­ing a bit more on the con­flict side would make the syn­tax much more palat­able. I call it, as it is, not too bad.

The last else’ clause as­sures you then if you add a new type to the dis­crim­i­nated union and for­get to up­date one of the MATCH clauses at least you get a run time er­ror. That is not good. Functional lan­guages would give you a com­pile time er­ror, which is much bet­ter. Maybe with ju­di­cious use of tem­plates you can bring the er­ror at com­pile time.

The macros are triv­ial:

class match_exception: std::exception {};
#define DU_MATCH(unionName) { auto du_match_var = unionName; if(false) {}
#define DU_CASE_TAG(entry, ...)                        \
    else if(du_match_var.Is##entry()) {                \
        __VA_ARGS__                                    \
    }
#define DU_CASE(entry, ...)                            \
    else if(du_match_var.Is##entry()) {                \
        auto value = du_match_var.Get##entry();        \
        __VA_ARGS__                                    \
    }
#define DU_DEFAULT(...)                                \
    else if(true) { __VA_ARGS__}
#define DU_MATCH_END else {throw new match_exception();} }

Let’s now go back to our ini­tial goal and see how far off we are. We were try­ing to do the fol­low­ing:

type LivingEntity =
| Person of string
| Dog of string
let print = function
    | Person(name) -> printfn "I'm a per named %s" name
    | Dog(name)    -> printfn "I'm a dog named %s" name

And here is what we ended up with:

DU_DECLARE(LivingEntity)
    DU_VALUE(Person,    string)
    DU_VALUE(Dog,        string)
DU_END
auto print(LivingEntity en) -> void {
    DU_MATCH(entity)
        DU_CASE(Dog,    cout << "I'm a dog named " << value;)
        DU_CASE(Person, cout << "I'm a per named " << value;)
    DU_MATCH_END
}

In our Switch case:

type Switch = On | Off

You get the good look­ing :

DU_DECLARE(Switch)
    DU_FLAG(On)
    DU_FLAG(Off)
DU_END

And along the way we lost com­pile time type safety in the very com­mon case of adding new types to the dis­crim­i­nated union. That’s bad.

Also some of you would strongly dis­like the (ab)use of macros. As for me, it looks work­able.

Tags

4 Comments

Comments

Cl&#233;ment

2012-07-21T11:27:20Z

Amusing so­lu­tion! I’d like to ex­pend on the first few lines of your ar­ti­cle though:
Very roughly, where ob­ject ori­en­ta­tion uses de­riva­tion, func­tional pro­gram­ming uses al­ge­braic datatypes.
Expanding on this, you could im­ple­ment your LivinEntity type like this:

#include
using namespace std;
class LivingEntity {
public:
string name;
virtual void print() = 0;
LivingEntity(string name) {
this->name = name;
}
class Person;
class Dog;
};
class LivingEntity::Person : LivingEntity {
public:
Person(string name) : LivingEntity(name) {}
void print() {
cout << "I am a per named " << name << endl;
}
};
class LivingEntity::Dog : LivingEntity {
public:
Dog(string name) : LivingEntity(name) {}
void print() {
cout << "I am a dog named " << name << endl;
}
};
int main()
{
LivingEntity::Dog a_dog = LivingEntity::Dog("a dog");
LivingEntity::Person a_person = LivingEntity::Person("a person");
a_dog.print();
a_person.print();
return 0;
}

Also, why do you use else if(true) { __VA_ARGS__} in­stead of just else { __VA_ARGS__} in your macros?
Thanks for the nice ar­ti­cle!

Hi Clement,
Yes, you can sim­u­late al­ge­braic datatypes with in­her­i­tance and type test­ing, al­beit more ver­bosely and less ef­fi­ciently. The if(true) thing is just a bug. Probably at some point in the cod­ing the con­di­tion was a bit more in­ter­est­ing.

Christoph Senjak

2017-02-02T03:30:51Z

Does this also work with re­cur­sive datatypes?