Rust for C++ developers part 4: Enums, destructuring and pattern matching

Rust enums are similar to the ones in C++ with one subtle difference - Rust ones allow you to bundle additional data with each variant. This feature combined with destructuring and pattern matching is very powerful. What I find interesting is that each feature is nothing special on its own but combining them together allows you to write very elegant code. In my opinion these are one of the coolest features of Rust. Let's start with enums.

Enums

We can declare an enum like this:

enum OpMode {
    Server,
    Client,
}

Each enums represents a new type. You can have a function with an enum for an input parameter and so on. For example fn start(mode: OpMode) {} can be invoked with a concrete enum variant like start(OpMode::Server);.

Attaching data to each enum variant is done like this:

enum OpMode {
    Server { port: u32, max_conns: usize },
    Client(u32, u16),
}

First note that Server and Client have got different data attached to them.

Second note that there are two ways to attach additional data to an enum. Server uses struct-like syntax which means you have got named fields like in a struct. Client uses a tuple-like syntax. There are no field names and each field is accessed by its index. Named fields are better when there are a lot of parameters. Tuples are a good fit for self-explanatory data types (e.g. complex numbers, coordinates). Both are used in practice.

The next logical question is if the enum has got additional data attached to each variant how do we access it? To answer this question we need two more concepts from the Rust language - destructuring and pattern matching.

Destructuring

Destructuring means to bind a variable to each element/field of a type containing multiple elements (tuple, array) or multiple fields (struct, enum). This is probably too vague so let's see some examples. We'll use a tuple to demonstrate the syntax and then show examples with some other data types.

Destructuring a tuple means to bind its values to separate variables. For example:

fn main() {
    let data = (1, 2, 3);
    let (first, second, third) = data;
    assert_eq!(first, 1);
    assert_eq!(second, 2);
    assert_eq!(third, 3);
}

Here data is a tuple with three elements. We want to destructure it into three separate variables: first, second and third. Have a look at the assignment operation on the second line in main. On the left side we use let and define the three variables in braces, just as we would do if we want to declare a tuple. On the right side of the assignment we have got the tuple we want to destructure. Each value in the tuple will be bound to the corresponding variable name on the left side of the assignment. assert_eq! is a Rust macro which panics of its parameters are not equal. It is usually used in tests but it's also useful in code examples like this one.

Also note that the destructuring performs a move operation so data is no longer usable after it. Of course we can borrow instead of move:

fn main() {
    let data = (1, 2, 3);
    let (first, second, third) = &data;
    assert_eq!(*first, 1);
    assert_eq!(*second, 2);
    assert_eq!(*third, 3);
}

In this case we borrow the elements of data and first, second and third are references.

If we are not interested in any of the elements we can skip them with _. For example if we want to get only the second element we can do:

fn main() {
    let data = (1, 2, 3);
    let (_, second, _) = data;
    assert_eq!(second, 2);
}

With ... we can skip the all the elements up to the end of the collection. For the tuple in our example:

fn main() {
    let data = (1, 2, 3);
    let (first, ..) = data;
    assert_eq!(first, 1);
}

We can combine _ and ... together:

fn main() {
    let data = (1, 2, 3, 4, 5);
    let (_, second, ..) = data;
    assert_eq!(second, 2);
}

All the examples so far were with tuples. but everything which works for them works for arrays too:

fn main() {
    let data = [1, 2, 3, 4, 5];
    let [_, second, ..] = data;
    assert_eq!(second, 2);
}

The only difference is the syntax we use. We are destructuring an array, so instead of braces we use square braces in the assignment. Everything else is the same.

Now let's see how struct destructuring works:

struct ServerConfig {
    port: u32,
    max_connections: u32,
}

fn main() {
    let data = ServerConfig {
        port: 80,
        max_connections: 100,
    };

    let ServerConfig {
        port: server_port,
        max_connections: server_max_conns,
    } = data;

    assert_eq!(server_port, 80);
    assert_eq!(server_max_conns, 100);
}

We have got struct ServerConfig with two fields. We initialise it in main and destructure it immediately after that. Note that the pattern is the same - we have got a let statement mimicking structure initialisation. Each field of the struct is bound to a local variable (e.g. port: server_port). If we want to give the local variables the same name as the fields we can skip the mapping and use this shorter syntax:

struct ServerConfig {
    port: u32,
    max_connections: u32,
}

fn main() {
    let data = ServerConfig {
        port: 80,
        max_connections: 100,
    };

    let ServerConfig {
        port,
        max_connections,
    } = data;

    assert_eq!(port, 80);
    assert_eq!(max_connections, 100);
}

With destructuring we can also extract the additional data from enum variants. But each variant can have different data attached to it so destructuring enums is not as straightforward as for tuples, arrays and structs. When we work with enum we don't necessary know which variant it holds so we need to check it somehow. This can be done with an if let statement. Let's pretend mode in the next sample is enum OpMode. We can extract the data bundled with the Client variant like this:

if let OpMode::Client(port, buf_len) = mode {
    println!("Client: port: {}, buf_len: {}", port, buf_len);
}

If mode holds OpMode::Client we destructure it into port and buf_len. If it doesn't - we do nothing. If we want to handle all variants we need multiple if let statements:

if let OpMode::Client(port, buf_len) = mode {
    println!("Client: port: {}, buf_len: {:?}", port, buf_len);
} else if let OpMode::Server { port, max_conns } = mode {
    println!("Server: port: {}, max_conns: {:?}", port, max_conns);
}

Not exactly elegant, right? Luckily there is a better way with pattern matching.

Pattern matching

Pattern matching is a powerful feature of Rust which as far as I know has got no alternative in C++. Let's see a very simple example:

fn main() {
    let cond = true;

    match cond {
        true => println!("Cond is true"),
        false => println!("Cond is false"),
    }
}

We have got a main function which performs pattern matching on a bool. This is usually an overkill because an if statement will be more than enough but I want to start with something simple. Pattern matching is performed with the match keyword followed by an expression. Here we have got a bool but the expression can be anything. What we have got in the curly braces is called 'match arms'. We have got an expression followed by =>. If the expression in the match arm is equal to the expression in the match - the corresponding code block of the match expression is executed. Match arms are evaluated top to bottom and if any arm matches the following arms are ignored.

In our example we are matching on cond which can be true or false. In case of true we execute println!("Cond is true"). On false - println!("Cond is false"). Note that our example uses a single statement in each match arm but this is not necessary. You can create a scope (with {}) and put whatever you want there.

Pattern matching is very useful with enums. Let's see how we can read the data attached to the enum variant from the example in the previous section:

enum OpMode {
    Server { port: u32, max_conns: usize },
    Client(u32, u32),
}

fn start(mode: OpMode) {
    match mode {
        OpMode::Client(port, buf_len) => println!("Client: port: {}, buf_len: {}", port, buf_len),
        OpMode::Server { port, max_conns } => {
            println!("Server: port: {}, max_conns: {}", port, max_conns)
        }
    };
}

fn main() {
    start(OpMode::Server {
        port: 80,
        max_conns: 10,
    })
}

First let's focus on fn main. We create an instance of OpMode::Server and initialise its values. Remember that Server was using named fields. Now let's have a look at fn start(mode: OpMode). We have got a match expression on mode. The first arm matches OpMode::Client but note the values in the braces - OpMode::Client(port, buf_len). Besides matching we are also destructuring the enum. The data bundled with it is bound to port and buf_len (remember it was a tuple with just two values) and we can use them in the code block of the match arm. It's worth noting that port and buf_len live only in the scope of the match arm.

The second arm matches OpMode::Server which is using named fields and again we perform matching and destructuring. We use the short syntax (the bounding variables match the field names) and again we can use these variables in the code block of the match arm.

One very important and useful thing I haven't mentioned so far is that the match expression needs to be exhaustive or in other words - there should be a match arm for each possible value. This is especially useful for enums. Let's see why. First add a third variant to the enum in the last example:

enum OpMode {
    Server { port: u32, max_conns: usize },
    Client(u32, u32),
    Nothing,
}

And then compile the code. You will get something like:

error[E0004]: non-exhaustive patterns: `OpMode::Nothing` not covered
 --> src/main.rs:8:11
  |
8 |     match mode {
  |           ^^^^ pattern `OpMode::Nothing` not covered
  |
note: `OpMode` defined here
 --> src/main.rs:1:6
  |
1 | enum OpMode {
  |      ^^^^^^
...
4 |     Nothing,
  |     ------- not covered
  = note: the matched value is of type `OpMode`
help: ensure that all possible cases are being handled by adding a match arm with a wildcard pattern or an explicit pattern as shown
  |
12~         },
13+         OpMode::Nothing => todo!()
  |

For more information about this error, try `rustc --explain E0004`.

The first line says "non-exhaustive patterns: OpMode::Nothing not covered". The compiler will generate and error for all the places where we perform pattern matching on the enum and we don't cover all the variants. This is great! Think about how many bugs you can create by introducing a new variant and forgetting to handle it properly somewhere? No need to worry about this with match statements and for this reason I strongly advise using them instead of if/else statements.

Now after you know that match expressions should be exhaustive you probably are asking yourself "What if I want to match on a u32 or I have got too many cases which I need to handle the same way?". There is a solution for that. Let's see a simple example where we match on a u32. The same technique can be applied on enums and other expressions:

fn main() {
    let cond = 5;

    match cond {
        0 => println!("cond is zero"),
        1 | 2 => println!("cond is 1 or 2"),
        cond if cond > 2 && cond < 6 => println!("cond is between 2 and 6"),
        _ => println!("cond is something else"),
    }
}

With the logical OR (|) we can match on multiple expressions. In this case we can handle 1 and 2 within the same match arm. We can use if to cover certain values. The syntax may look weird but it will make sense if you are destructuring something more complicated like a struct. E.g. OpMode::Client(port, buf_len) if port > 80 => {..}. And finally _ is a catch all expression. It will match anything and ignore its value.

Conclusion

Enums, pattern matching and destructuring allow you to write very elegant code. What I really like is that each time you destructure or pattern match something (of course without using _) you have a guarantee that the compiler will notify you on any change (e.g. new field to a structure) which you might have missed. Compared to the manual search approach this is a huge improvement.

Bundling data with enums also helps with the encapsulations. As a C++ programmer I considered the enums as a glorified constants at best. In Rust you can bundle data with it and make them more useful. Of course this is far from a feature which will make you abandon C++ and start writing in Rust but it is a nice to have.

It's no coincidence that enums are used heavily in Rust std but more on this in the next post.

Comments

Comments powered by Disqus