C++ Language Support for Pattern Matching and Variants

January 22, 2016 | By David Sankel | Filed in: Uncategorized.

The C++ Programming Language needs a language based variant, or at least P0095R0 argued for it at the 2015 Kona C++ standardization meeting. P0095R0, however, didn’t fully explore generalized pattern matching, which is another desirable feature that is highly related. This post explores some ways to generalize the minimal pattern matching described in P0095R0.

Update 2/25/15. The Jacksonville C++ meeting is only considering proposals for C++17. Given that this is a C++20 paper, we will publish the next revision at the Oulu meeting in June.

Preliminaries

We assume the reader has a basic understanding of both variants and pattern matching as language features. A quick skim of P0095R0 should help in this respect. The intent with this post is to gather some early feedback before presenting the next revision of P0095 at the 2016 Jacksonville C++ committee meeting.

Note that the features described here are not being proposed for C++17, but for the following standard revision.

Review

This section reviews language-based variants.

It’s difficult to imagine going without variants once you have them. We’ve had library-based variants for a while thanks to Boost.Variant and an improved variant library is likely coming with C++17. That’s great news for the C++ community. On the other hand, variant libraries can be difficult to use in practice when it comes to recursion, forward declarations, identifier-based inspection, and creating new types. That’s where variants as a language feature becomes a compelling alternative.

Language based variants (what we’re calling lvariant for now 1) are a lot like simple structs. They both have names and a list of fields with types and identifiers. Consider the following example:

There is a critical distinction in semantics, however. Instances of type user_information, as declared above, consist of either a name or an id. If user_information were defined to be a struct, its values would consist of both a name and an id. When you think of a variant’s members, think either or. When you think of a simple struct’s members, think both and.

The user_information lvariant above could represent a form entry which contains either a user name or a numeric user ID.

It turns out that variants come up frequently in every day programming. Imagine you are writing a data structure representing a git command-line abstract syntax tree (AST) where certain flags are available only when the corresponding command is active. An lvariant can represent the top of the AST.

Instances of git_command_line_parse will either have git init flags (init) or git clone flags (clone).

Lets look at another example. JavaScript Object Notation (JSON) is a common interchange format for web frameworks. An lvariant can implement an in-memory representation of a JSON document:

There are a couple important things to point out in this example.

  1. We achieve recursion using std::unique_ptr. Like structs, lvariant fields can make use of the lvariant‘s name to achieve recursion. Also like structs, the type is “incomplete” and must be wrapped in a pointer of some sort.
  2. std::monostate is a type included in the current language-based variant proposal. In both library and language based variants, std::monostate is used for fields which don’t carry any extra data. Its definition is simply struct monostate {};.

Before we move on, lets define some useful terms for discussing pattern matching and variants in C++. We use the word “piece” to denote a field in a struct. The word “alternative” is used for lvariant fields. The programming language theory savvy will also recognize lvariants to be sum types and simple structs to be product types, although we won’t use that jargon here.

Pattern matching integrals and enums

The most basic pattern matching is that of integral (ie. int, long, char, etc.) and enum types, and that is the subject of this section. Before we get there, however, we need to distinguish between the two places pattern matching can occur. The first is in the statement context. This context is most useful when the intent of the pattern is to produce some kind of action. The if statement, for example, is used in this way. The second place pattern matching can occur is is in an expression context. Here the intent of the pattern is to produce a value of some sort. The trinary operator ?:, for example, is used in this context. Upcoming examples will help clarify the distinction.

Each context uses a different pattern matching keyword. inspect_s is used in statement contexts. inspect_e is used in expression contexts 2.

In the following example, we’re using inspect_s to check for certain values of an int i:

The _ character is the pattern which always succeeds. It represents a wildcard or fallthrough. The above code is equivalent to the following switch statement.

inspect_e can be used to pattern match within expression contexts as in the following example. c is an instance of the color enum:

All we’ve seen so far is a condensed and safer switch syntax which can also be used in expressions. Pattern matching’s real power comes when we use more complex patterns though. We’ll see some of that later.

Pattern matching structs

Pattern matching structs isn’t all that interesting in isolation: they merely bind new identifiers to each of the fields.

struct patterns aren’t limited to binding new identifiers though. We can instead use a nested pattern as in the following example.

While the above code is certainly condensed it also lacks some clarity. It is tedious to remember the ordering of a struct‘s fields. Not all is lost, though; Alternatively we can match using field names.

Pattern matching lvariants

Pattern matching is the easiest way to work with lvariants. Consider the following binary tree with int leaves.

Say we need to write a function which returns the sum of a tree object’s leaf values. Variant patterns are just what we need. A pattern which matches an alternative consists of the alternative’s name followed by a pattern for its associated value.

Assuming we can pattern match on the std::pair type, which we’ll cover later, this could be rewritten as follows.

More complex datatypes

Pattern matching can make difficult code more readable and maintainable. This is especially true with complex patterns. Consider the following arithmetic expression datatype:

We’d like to write a function which simplifies expressions by exploiting exp + 0 = 0 and 0 + exp = 0 identities. Here is how that function can be written with pattern matching.

Here we’ve introduced a new * keyword into our patterns. *(<pattern>) matches against types which have a valid dereferencing operator and uses <pattern> on the value pointed to (as opposed to matching on the pointer itself). A special dereferencing pattern syntax may seem strange for folks coming from a functional language. However, when we take into account that C++ uses pointers for all recursive structures it makes a lot of sense. Without it, the above pattern would be much more complicated.

Pattern matching tuple-like types

Now we have patterns for integrals, enums, simple structs, and lvariants. Is there a way to enable pattern matching for custom data types? The answer, of course, is yes.

Tuple-like types are those which behave a lot like simple structs. These objects represent a sequence of values of various types. std::pair and std::tuple are notable examples. In this section we’ll see how we can annotate custom tuple types for pattern matching.

Pattern matching for tuple-like types is accomplished by overloading the extract operator. Imagine we have a custom pair type that has its m_first and m_second member variables declared private. We overload the extract operator as follows:

The signature of the extract operator function provides both the number of pieces and the type of each piece. The code in the body of this operator overload connects the actual pieces m_first and m_second to their placeholders x and y. This is all the information required for the compiler to use tuple-like objects in pattern matching.

Pattern matching variant-like types

We would also like to generalize matching for variant-like types. Our example is an either template. It is the variant analogue to std::pair.

Of course, the above implementation will pattern match without modification since we are using an lvariant. Let us consider, for the sake of discussion, that the data type was implemented as follows.

To enable pattern matching for this type, we need to implement two operator overloads: discriminator and alternative.

The discriminator operator overload returns a integral or enum value corresponding to the question of which alternative is currently active.

The alternative operator is an overloaded function taking in a single std::variant_piece parameter. The first template argument of std::variant_piece is the type of that alternative. The second template argument is a value of the return type of discriminator 3.

Now we have enough information to use our specialized either class in pattern matching:

Conclusions

Hopefully this post gives you a good idea of the direction we’re headed in with regard to generalized pattern matching and language-based variants. There are still topics to be fleshed out such as the desirability of pattern guards and whether match <pattern>: syntax is preferable to <pattern> =>.

Any thoughts on what’s been presented? Any good ideas for improvement? Constructive comments are very welcome.

Acknowledgements

Thanks to Vincente Botet Escriba, Bjarne Stroustrup, Bengt Gustafsson, and the C++ committee as a whole for productive design discussions. Also, Yuriy Solodkyy, Gabriel Dos Reis, and Bjarne Stroustrup’s prior research into generalized pattern matching as a C++ library has been very helpful.


  1. P0095R0 originally used the syntax enum union for a language based variant. We’re using lvariant here as a placeholder as the committee thought the enum union syntax could create unnecessary confusion. 
  2. P0095R0 originally used switch and case syntax for pattern matching. They’ve been changed here to eliminate confusion as suggested by the committee. 
  3. This syntax will only work if P0127R0 goes into the language, which seems likely. Otherwise, we would need to explicitly specify the discriminator type as in std::variant_piece<T, selection, left>

43 comments on “C++ Language Support for Pattern Matching and Variants

  1. Grigoriy Chudnov says:

    hm.. discriminator and alternative looks too complex.

    Have you considered something like unapply method in Scala? Are there any problems?

    • admin says:

      If I understand it correctly, Scala’s unapply approach used on a custom variant type would make pattern matching consist of several if/else if clauses. As a result, it would be O(n) where n is the number of alternatives in the variant.

      The approach presented in the post allows one to keep O(1) matching for custom variant types.

      • John Skaller says:

        Pattern matching in general form is intrinsically sequential. In some case you can do optimisations but it is hard. So is exhaustion analysis.
        [There was a recent paper on how to do this, but it is impossible with guards anyhow]

        If you want to see how to do fully generalised pattern matching look at Felix, which has user defined patterns. Felix generates C++ (so I personally have been able to do variants in C++ for ages 🙂

        This requires TWO functions: a disciminator which is PREDICATE. Yes it matches, No it doesn’t.

        The second function is the extractor, which get the argument from the value. Only required if there is an argument.

        Patterns in reasonable generality can have predicates as guards. Since the meaning of the guard cannot be determined, guarded match branches MUST be handled sequentially.

        Furthermore, with branch prediction, if/then/else chains are probably FASTER than indexed jumps.

        Example: user defined pattern matching for Chris Okasaki Random Access List:

        //$ User defined pattern matching support
        fun _match_ctor_Cons[T] (x:ralist[T]) =>not ( ralist_empty x);
        fun _match_ctor_Empty[T] (x:ralist[T]) => ralist_empty x;

        fun _ctor_arg_Cons[T] (x:ralist[T]) : T * ralist[T] =

        which allows the usual ML like syntax:

        match x with | Empty => …. | Cons (head, tail) => …

        BTW: before doing variants and pattern you really need tuples (and no, library tuples are NOT good enough).

        • David Sankel says:

          Thanks for your thoughts.

          I agree that general pattern matching with guards is intrinsically sequential. That being said, I feel like allowing the compiler to optimize for the common case where it it isn’t sequential is very important for systems language like C++.

          Even when some guards are used in a particular inspect statement, the compiler should have the ability to make O(1) jumps where it can. Discriminators as predicates prevents this.

          Can you substantiate your claim that if/else chains are faster than indexed jumps with some performance benchmarks?

          Also you claim that first class tuples are a prerequisite to variants and pattern matching. Can you elaborate on that? It isn’t immediately obvious to me.

          • John Skaller says:

            @David, no i cannot substantiate my claim if/then/else chains are faster. It’s likely to be architecture dependent. Most modern Intel CPU’s have pipelining and speculative branching which would be defeated by a computed jump. Obviously there’s a complex tradeoff related to the number of cases in the branch.

            My point would be, do not design the pattern matching semantics so they’re restricted by this requirement. IMHO it’s much more important to get good general semantics into a language. Do not forget that a compiler can *calculate* in some cases when a computed jump can replace an if/then/else chain: I would guess clang can do that (but I’m not sure).

            It’s possible given an if/then/else chain that more benefit would be derived from lifting the conditionals out of the code so they’re all in one place with an extra jump to the match handler. One extra jump to code not in the cache, to ensure the sequence of tests are done by instructions which *are* all cached. Again, architecture dependent.

            The thing is a lot of pattern matches cannot use a jump on the outside cases anyhow. For example (in Felix/ML notation sorry)

            match x with | 1 => .. | 3 => .. | 27 => ..

            which is just a C switch. There’s no choice. You have to use an if then else chain.

  2. Vicente Botet Escriba says:

    I like the direction this is taking. Next follow some comments

    * First some typos or possible ambiguities

    Shouldn’t sum be used instead of sum_expression in


    expression simplify( const expression & exp ) {
    return inspect_e( exp ) {
    sum_expression {*(literal 0), rhs} => simplify(rhs)
    sum_expression {lhs , *(literal 0)} => simplify(lhs)
    _ => exp
    };
    }

    In the either user class


    T& left() {
    assert( m_selection == left );
    return m_left; // (*)
    }

    Don’t the enumerator left and the getter left() conflict?

    * About the operator discriminator

    This is a new kind of operator, as it is not a conversion operator. This introduce a new discriminator keyword as the user can have already a discriminator type.

    Why not require a discriminator alias and a simple conversion


    using discriminator = selection;
    operator discriminator() {...}

    What would be wrong with a free function?

    * About the operator alternative

    This is a new kind of operator.
    What is wrong with the variant-like access variant_size and get(Var)?

    * About the operator extract

    This is a new kind of operator. What is wrong with the tuple-like access tuple_size and get(Tpl)?

    * About inspect cases scope

    Given


    lvariant tree {
    int leaf;
    std::pair< std::unique_ptr, std::unique_ptr > branch;
    }

    int sum_of_leaves( const tree & t ) {
    return inspect_e( t ) {
    leaf i => i
    branch b => sum_of_leaves(*b.left) + sum_of_leaves(*b.right)
    };
    }

    the leaf symbol is local to tree. Which magic makes it available inside inspect_e(t)?

    * About the use of unique_ptr to reference recursive variants

    Given the recursive variant example


    lvariant tree {
    int leaf;
    std::pair< std::unique_ptr, std::unique_ptr > branch;
    }

    When the branch alternative is used the unique_ptr can not be nullptr. I believe that a more constraining value type class must be used instead, something like boost::recursive_wrapper. recursive_wrapper is a value type class that should be implicitly convertible to tree&.


    lvariant tree {
    int leaf;
    std::pair< recursive_wrapper, recursive_wrapper > branch;
    }

    int sum_of_leaves( const tree & t ) {
    return inspect_e( t ) {
    leaf i => i
    branch {left, right} => sum_of_leaves(left) + sum_of_leaves(right)
    };
    }

    * Recursive matching

    I we use recursive_wrapper instead of unique_ptr, there will be no need to use the operator* as recursive_wrapper is a value type, isn’t it?

    E.g.


    struct sum_expression {
    recursive_wrapper left_hand_side;
    recursive_wrapper right_hand_side;
    };

    expression simplify( const expression & exp ) {
    return inspect_e( exp ) {
    sum {(literal 0), rhs} => simplify(rhs)
    sum {lhs , (literal 0)} => simplify(lhs)
    _ => exp
    };
    }

    * About *()

    IIUC, the need for the operator* is associated to unique_ptr.
    Could the unique_ptr by nullptr? and if yes, how the user could check it.

    If a variant is considered a value type, I don’t know if this particular case would merit the additional syntax.

    Vicente Botet

    P.S. Please be free to fix the format of this comment if the code is unreadable

    • David Sankel says:

      > Shouldn’t sum be used instead of sum_expression?

      Fixed.

      > Don’t the enumerator left and the getter left() conflict?

      Fixed.

      > Why not require a discriminator alias and a simple conversion
      > using discriminator = selection;
      > operator discriminator() {…}
      > What would be wrong with a free function?

      I’m not sure what you mean here. There needs to be a way for the system to query the discriminator value and its type so it seems like a function is a good way to go.

      Do you have any benefits in mind if we were to make the discriminator operator a free function instead of a member function?

      > What is wrong with the variant-like access variant_size and get(Var)?

      I don’t follow you here. Same with your related comment on the extract operator.

      > The leaf symbol is local to tree. Which magic makes it available inside inspect_e(t)?

      Great question. I think it is a flaw that it invokes magic. It probably would be better to require the pattern constructs to be scoped properly as it was in the original paper.


      int sum_of_leaves( const tree & t ) {
      return inspect_e( t ) {
      tree::leaf i => i
      tree::branch b => sum_of_leaves(*b.left) + sum_of_leaves(*b.right)
      };
      }

      > I believe that a more constraining value type class must be used instead, something like boost::recursive_wrapper.

      I don’t think that introducing another concept here is necessary as it is with a library-based variant. The ability to denote a not-null pointer in C++ is an orthogonal problem that other proposals are attempting to solve.

      The ‘not_null’ type in the C++ Core Guidelines would solve this cleanly, for example.


      lvariant tree {
      int leaf;
      std::pair< std::not_null< std::unique_ptr >,
      std::not_null< std::unique_ptr > > branch;
      }

      > Could the unique_ptr be nullptr? and if yes, how the user could check it.

      Great question. I think using ‘nullptr’ as a pattern that matches against all pointer types would work.

      • Vicente Botet Escriba says:

        My question is: What is the rationale of creating 3 new specific operators (extact, discriminator and alternative) and not use a library solution?

        The creation for new specific operators doesn’t seems necessary.

        why a new operator

        selection operator discriminator();

        why not a member function

        selection discriminator();

        or a free function?

        Why a new operator

        operator alternative( std::variant_piece x )

        why not a member function

        void alternative( std::variant_piece x)
        or
        T& alternative(variant_discriminator)

        or a free function.

        T& get(variant_discriminator, either&);

        IMO, when an efficient library solution exists it is worth using it instead of adding new features to the language except if the library solution is not friendly. Do you consider the tuple-like access not friendly as customization point?

        • David Sankel says:

          Thanks for helping me understand what you meant here.

          I did some investigation into using the tuple interface as an extension point instead and I judged the complexity to be beyond the reach of the everyday Joe C++ developer.

          • Vicente Botet Escriba says:

            After some thought, using specific operators would allow to define the language feature independently of the library.

            However I don’t see how the extract operator could take in account the following case. If we want to extract 3 elements from int v[3] as PO144 does, this should be predefined as we can not define the extract function.

          • Vicente Botet Escriba says:

            After some thought, using specific operators would allow to define the language feature independently of the library.

            Have you think on having two operators for product types, one to obtain the number of elements and the other to obtain the nth element? This could help to have a more efficient implementation when the pattern ignores a specific element or when the pattern use specific names.

          • David Sankel says:

            I don’t see how having two operators for product types is going to have any effect on either compile-time or run-time efficiency. The compiler can inspect the number of arguments for that function without issue IIUC.

          • Vicente Botet Escriba says:

            In addition to Pattern matching, we need an alternative and efficient interface to access the elements of a tuple-like type. We can not define std::apply(f, tpl) with pattern matching, isn’t it?

            Currently we have get(t) and the tuple_size. What would be the interface for the library algorithms developer after the introduction of patter matching?

          • David Sankel says:

            This is a good segue to your upcoming proposal :).

      • Vicente Botet Escriba says:

        >> The leaf symbol is local to tree. Which magic makes it available inside inspect_e(t)?

        > Great question. I think it is a flaw that it invokes magic. It probably would be better to require the pattern constructs to be scoped properly as it was in the original paper.

        int sum_of_leaves( const tree & t ) {
        return inspect_e( t ) {
        tree::leaf i => i
        tree::branch b => sum_of_leaves(*b.left) + sum_of_leaves(*b.right)
        };
        }

        I was expecting some magic 🙁

        What do you think of evaluating the pattern expression in the context of the inspected variable? This should work at least for variants.

        I should work also for struct fields as in

        inspect_s(p) {
        {hitpoints:1}

        Why use : instead of ==?

        Maybe OT, do we want to have patterns such as hitpoints>1?

        • David Sankel says:

          > What do you think of evaluating the pattern expression in the context of the inspected variable? This should work at least for variants.

          I think if we go down that route, we’d also be bringing private variables and other unwanteds into scope. I can see this causing problems.

          > Why use : instead of ==?

          The thing on the right of a : is a pattern in the struct field example. This doesn’t map directly to value-semantic equality which is what people generally think == refers to.

          > Maybe OT, do we want to have patterns such as hitpoints>1?

          I think this might be interesting and is what I was alluding to in the conclusion when I mentioned guards.


          inspect_s( e ) {
          exp::sqrt {exp::literal n} if (n<0) =>
          std::cout < < "Warning: you're asking for a runtime error" << std::endl; }

          • Vicente Botet Escriba says:

            Agreed for the unwanted private variables. I believe however we need something to avoids qualification.

            Agreed, n>1 could be managed with guards (I missed this part).

  3. Vicente Botet Escriba says:

    I would like to see inspect to be variadic so that matching multiple arguments is possible.

    Structured binding should be a particular case of pattern matching.

    auto {x,y,z} = f();

    Structured binding supports doing it by cv references.

    auto const= f();

    This would mean that the user must be able to say if the capture of variables of the pattern is done by copy or by reference. How could the user do that with your ongoing proposal?

    • David Sankel says:

      > I would like to see inspect to be variadic so that matching multiple arguments is possible.

      What do you see as the benefits of adding this feature when compared with using, say, std::make_tuple?

      > Structured binding should be a particular case of pattern matching.

      I agree that this would be nice.

      > This would mean that the user must be able to say if the capture of variables of the pattern is done by copy or by reference. How could the user do that with your ongoing proposal?

      Good question. I haven’t given it much thought until now. I suppose there isn’t much difficulty in allowing type qualifications before identifiers in patterns. Structured binding would default to value types where binding within inspect would default to either && or const & depending on the const-ness of the value being matched.

      This brings up another question. What do we do when a pattern doesn’t match with structured binding? For example:


      // What happens here?
      auto {1, name} = std::pair(2, "Jonny");

      Maybe a throw of a std::match_exception or something of the sort?

      • Vicente Botet Escriba says:

        Why request make_tuple, when making inspect variadic is clearly more friendly and not too much more complex?

        Structured bindings accepts (up to now) only variables no literals and there is no support for &&.

        I believe other kind of pattern matching should be reserved to inspect.

      • John Skaller says:

        When pattern matching there are three interesting conditions:

        1. A pattern match cannot fail.
        2. The matching isn’t exhaustive (can fail)
        3. A branch follows a match which cannot fail

        If the match could fail, you add a wildcard branch throwing an exception. If the user specifies it with a compiler flag, you can issue a warning.

        If a branch exists which cannot be taken, it is not a good idea to always issue a warning. Felix compiler actually generates such matches and the user would get confused being warned about generated construct. Also programmers may add a premature wildcard deliberately during development or debugging.

        In an if/then/else chain, a terminal wildcard throwing an exception does not cost any time (only code space). And you have to do *something*. Felix aborts the program. Ocaml throws. C++ should throw IMHO.

  4. Utkarsh says:

    Interesting. I think pattern matching for tuples can make working with them a lot easier. Will save a lot of variadic template recursion and enable ifs.

  5. Alex says:

    Will it be possible to use it for SFINAE or concepts check, like testing if the class could be matched by some pattern?

    • David Sankel says:

      This hasn’t been explored yet. My intuition tells me that someone could use it for matching classes (vs. instances) using something clever along the lines of Boost.Hana.

  6. Alex says:

    Wouldn’t _ symbol be ambiguous with some libraries, like google mock?

    • David Sankel says:

      That could very well be. Ideally we’d like a symbol that has minimal conflict with other libraries and, at the same time, helps make the code clear.

    • Matt Godbolt says:

      For what it’s worth all sensible libraries (including Google Mock) put their operator _ in a namespace: In GMock’s case it’s testing::_ (though most people put using namespace testing;). So i don’t see it as being a practical problem.

  7. Vicente Botet Escriba says:

    More typos:
    * Replace left/right by first/second

    branch b => sum_of_leaves(*b.left) + sum_of_leaves(*b.right)

    * replace return left/right by return m_left/m_right

    T& get_left() {
    assert( m_selection == left );
    return left;
    }

    U& get_right() {
    assert( m_selection == right );
    return right;
    }

  8. Zhihao Yuan says:

    inspect_e can be replaced by an inspect_s inside a lambda. And it might be beneficial if the lambda body can be replaced by an inspect statement.

    • David Sankel says:

      Interesting thought Zhihao. Could you give an example where it might be beneficial to do so?

      I’m not sure that inspect_e can *always* be replaced by an inspect_s since not all types can be captured by a lambda.

      • Zhihao Yuan says:

        So that you can pass something like this around. This structure presents in OCaml.

        [=, v = expr] inspect(auto&& i) {
        e => ...
        _ => ...
        }

        What type cannot be captured by lambda? It even gives you a place to define shorthand variables, which is what you can’t directly do with inspect_e.

        • David Sankel says:

          > What type cannot be captured by lambda? It even gives you a place to define shorthand variables, which is what you can’t directly do with inspect_e.

          You are right. My mistake. I was thinking about the kinds of values that can be returned from a lambda, but even that is moot now that we have required return value optimization.

          Are you thinking it may be better to only have inspect_s and omit inspect_e altogether?

  9. Lilian says:

    Cram in more, that will definitely make the language better and simpler…
    Also, seems like almost exact copy of how it is done in Rust…

  10. E Browne says:

    I’m sorry, I must be missing something very simple. How is lvariant different from union? I see the committee previously rejected enum union — why was even that required?

    Also, why not keep using the case(x): syntax? Just to avoid fall-through semantics? I don’t understand why the committee felt new syntax would reduce confusion?

    • David Sankel says:

      lvariant is different from union in that values of an lvariant encode which field is currently active whereas values of unions do not encode that information.

      The committee rejected the name “enum union”, but not the concept. “enum union” was thought to create too much confusion since “enum” and “union” are already well-understood keywords.

      The “case” syntax was rejected for similar reasons. In this case there was a concern that new people would see ‘case’ being used where no ‘break’ was necessary and forget to use ‘break’ when it actually is necessary.

  11. E Browne says:

    I should clarify my comment, and either further expose my ignorance or make explaining the background already tread easy. Why not extend union such that if no tag is manually specified in the members, the member name becomes the tag? If there are shared attributes of the same type and name, that is assumed the tag, and union is treated as legacy?

    • David Sankel says:

      lvariants use the member name as the tag as you suggest. Folks still have a need for plain unions though. Making unions act like lvariants would get a lot of resistance in the committee since it would change the meaning of old code. For example, the size of unions would increase.

  12. John Skaller says:

    Re real tuples: if you do not have real tuples known to the compiler, you have a blob type. The compiler than is unable to extract components with projections. How would you match a pair? You have to match with the two patterns:

    match e with px,py => …

    For the branch to be taken, px must match the first component of e, and py the second component. How can you even perform the test if e is just a blob type? (Abstract type). The problem is that you have used structural typing in the match, and the argument must be structurally typed as well.

    User defined matching can solve this. But that’s what I recommended: you must have separate functions to (a) check if the pattern matches and (b) extract the pieces if it does. Then the construction is supposed for any type in the library that cares to define how it can be pattern matched.

    Consider:

    match (a,b) with
    | 1,v1 => ..
    | 1,v2 =>
    | 1,v3 => …

    where v1/v2/v3 are cases of a variant, to optimise this you would do a if/then/else on the first component, then in the true branch equal 1, you can do a computed jump on the variant index for the second branch. You can’t do that unless the type of the tuple second component is a known variant type and that means you also need to know the type of the argument. You have to know its a tuple and what a tuple is: tuples have to be built in to the language.

    Vastly .. and I mean VASTLY in big fat capital letters .. more important than worrying about optimisation is being able to do compile time exhaustion analysis. This is actually quite hard but algorithms exist in the academic literature.

  13. John Skaller says:

    SO basically you have two choices:

    Introduce lvariants and low level inspect. These are simple and easy and you can use a computed jump for performance if it is faster than conditional chain. However it isn’t general pattern matching and can’t easily be extended to it because it allows only a single jump-on-tag operator plus a single extract argument operator for each variant constructor (which is probably universal plus a cast).

    Or you can introduce general pattern matching with two operators per branch: one to check if the pattern is matched and one to extract the required pieces. This generalises to ANY data type with user defined checkers and extractors.

    The first extension may be more likely to pass the committee, which probably doesn’t have much understanding of structural typing and algebraic data types and the wealth of theoretical research into it. However the more general proposal makes the extension significant enough to be seriously worthwhile because it can be retrofitted to ANY data type including tuples in the library, any user defined structs, and even classes with virtual dispatch stuff.

    Pattern matching in general form is intended to make programming easier. At the core the lvariant decoding is the real functionality, but people using languages with general pattern matching use it heavily. In fact I would say over 50% of all code in most programs is pattern matching. One must remember that “if/then/else” is nothing but a special case of a pattern match, and bool is nothing but a 2 case lvariant with no arguments to the constructors true and false. In fact in Felix this is literally the case: all conditionals are desugared to pattern matches (the code generator converts all pattern matches back to if then else chains .. because it is generating C++ which doesn’t have pattern matching 🙂

    I’m not on the committee more but I would guess you will have hard time with either proposal. I proposed variants myself for the 89 Standard, but at that point few had any idea what a sum type was. Given that Boost variant is completely wrong, all I can say is: Good luck! lvariants are the theoretically correct machinery.

  14. John Skaller says:

    Here is an approach to think about:

    1. map pattern matches to if then else chains, forget about computed jumps.

    2. implement an optimisation in the compiler that recognises if then else chains which decode a compact subrange of integer values and optimise that to a range check followed by a computed jump.

    The range check can be elided if the type is recognised as a subrange of int, even if the user cannot define such types the variants can use them for the variant tags.

    The advantage of decoupling these things is that the optimisation can apply to ANY code not just inspects. It will automatically consider the one case C++ already understands: booleans (actually one might consider char as well, 256 case jumps may be reasonable on modern computers).

    In addition, it can be used to re-encode sum types. For example consider:

    if x then
    if y then .. else ..
    else
    if y then ..
    else ..

    But this can be done in a SINGLE computed jump on value x + 2 * y. And there is no “inspect” statement there: the optimisation is more general, it doesn’t care how the branches are created. It works for generated ones and cases where the user wrote the code out by hand.

Leave a Reply

Your email address will not be published. Required fields are marked *