Rust for C++ developers part 4: Enums, destructuring and pattern matching
Rust enums are similar to the ones in C++ with one subtle difference - Rust ones allow you to bundle additional data with each variant. This feature combined with destructuring and pattern matching is very powerful. What I find interesting is that each feature is nothing special on its own but combining them together allows you to write very elegant code. In my opinion these are one of the coolest features of Rust. Let's start with enums.
Enums
We can declare an enum like this:
enum OpMode { Server, Client, }
Each enums represents a new type. You can have a function with an enum for an input parameter and so
on. For example fn start(mode: OpMode) {}
can be invoked with a concrete enum variant like
start(OpMode::Server);
.
Attaching data to each enum variant is done like this:
enum OpMode { Server { port: u32, max_conns: usize }, Client(u32, u16), }
First note that Server
and Client
have got different data attached to them.
Second note that there are two ways to attach additional data to an enum. Server
uses struct-like
syntax which means you have got named fields like in a struct. Client
uses a tuple-like syntax.
There are no field names and each field is accessed by its index. Named fields are better when there
are a lot of parameters. Tuples are a good fit for self-explanatory data types (e.g. complex
numbers, coordinates). Both are used in practice.
The next logical question is if the enum has got additional data attached to each variant how do we access it? To answer this question we need two more concepts from the Rust language - destructuring and pattern matching.
Destructuring
Destructuring means to bind a variable to each element/field of a type containing multiple elements (tuple, array) or multiple fields (struct, enum). This is probably too vague so let's see some examples. We'll use a tuple to demonstrate the syntax and then show examples with some other data types.
Destructuring a tuple means to bind its values to separate variables. For example:
fn main() { let data = (1, 2, 3); let (first, second, third) = data; assert_eq!(first, 1); assert_eq!(second, 2); assert_eq!(third, 3); }
Here data
is a tuple with three elements. We want to destructure it into three separate variables:
first
, second
and third
. Have a look at the assignment operation on the second line in main
.
On the left side we use let
and define the three variables in braces, just as we would do if we
want to declare a tuple. On the right side of the assignment we have got the tuple we want to
destructure. Each value in the tuple will be bound to the corresponding variable name on the left
side of the assignment. assert_eq!
is a Rust macro which panics of its parameters are not equal.
It is usually used in tests but it's also useful in code examples like this one.
Also note that the destructuring performs a move operation so data is no longer usable after it. Of course we can borrow instead of move:
fn main() { let data = (1, 2, 3); let (first, second, third) = &data; assert_eq!(*first, 1); assert_eq!(*second, 2); assert_eq!(*third, 3); }
In this case we borrow the elements of data
and first
, second
and third
are references.
If we are not interested in any of the elements we can skip them with _
. For example if we want to
get only the second element we can do:
fn main() { let data = (1, 2, 3); let (_, second, _) = data; assert_eq!(second, 2); }
With ...
we can skip the all the elements up to the end of the collection. For the tuple in our
example:
fn main() { let data = (1, 2, 3); let (first, ..) = data; assert_eq!(first, 1); }
We can combine _
and ...
together:
fn main() { let data = (1, 2, 3, 4, 5); let (_, second, ..) = data; assert_eq!(second, 2); }
All the examples so far were with tuples. but everything which works for them works for arrays too:
fn main() { let data = [1, 2, 3, 4, 5]; let [_, second, ..] = data; assert_eq!(second, 2); }
The only difference is the syntax we use. We are destructuring an array, so instead of braces we use square braces in the assignment. Everything else is the same.
Now let's see how struct
destructuring works:
struct ServerConfig { port: u32, max_connections: u32, } fn main() { let data = ServerConfig { port: 80, max_connections: 100, }; let ServerConfig { port: server_port, max_connections: server_max_conns, } = data; assert_eq!(server_port, 80); assert_eq!(server_max_conns, 100); }
We have got struct ServerConfig
with two fields. We initialise it in main
and destructure it
immediately after that. Note that the pattern is the same - we have got a let
statement mimicking
structure initialisation. Each field of the struct is bound to a local variable (e.g. port:
server_port
). If we want to give the local variables the same name as the fields we can skip the
mapping and use this shorter syntax:
struct ServerConfig { port: u32, max_connections: u32, } fn main() { let data = ServerConfig { port: 80, max_connections: 100, }; let ServerConfig { port, max_connections, } = data; assert_eq!(port, 80); assert_eq!(max_connections, 100); }
With destructuring we can also extract the additional data from enum variants. But each variant
can have different data attached to it so destructuring enums is not as straightforward as for
tuples, arrays and structs. When we work with enum we don't necessary know which variant it holds so
we need to check it somehow. This can be done with an if let
statement. Let's pretend mode
in the
next sample is enum OpMode
. We can extract the data bundled with the Client
variant like this:
if let OpMode::Client(port, buf_len) = mode { println!("Client: port: {}, buf_len: {}", port, buf_len); }
If mode
holds OpMode::Client
we destructure it into port
and buf_len
. If it doesn't - we do
nothing. If we want to handle all variants we need multiple if let
statements:
if let OpMode::Client(port, buf_len) = mode { println!("Client: port: {}, buf_len: {:?}", port, buf_len); } else if let OpMode::Server { port, max_conns } = mode { println!("Server: port: {}, max_conns: {:?}", port, max_conns); }
Not exactly elegant, right? Luckily there is a better way with pattern matching.
Pattern matching
Pattern matching is a powerful feature of Rust which as far as I know has got no alternative in C++. Let's see a very simple example:
fn main() { let cond = true; match cond { true => println!("Cond is true"), false => println!("Cond is false"), } }
We have got a main
function which performs pattern matching on a bool. This is usually an overkill
because an if statement will be more than enough but I want to start with something simple. Pattern
matching is performed with the match
keyword followed by an expression. Here we have got a bool
but the expression can be anything. What we have got in the curly braces is called 'match arms'. We
have got an expression followed by =>
. If the expression in the match arm is equal to the
expression in the match
- the corresponding code block of the match expression is executed. Match
arms are evaluated top to bottom and if any arm matches the following arms are ignored.
In our example we are matching on cond
which can be true
or false
. In case of true
we
execute println!("Cond is true")
. On false - println!("Cond is false")
. Note that our example
uses a single statement in each match arm but this is not necessary. You can create a scope (with
{}
) and put whatever you want there.
Pattern matching is very useful with enums. Let's see how we can read the data attached to the enum variant from the example in the previous section:
enum OpMode { Server { port: u32, max_conns: usize }, Client(u32, u32), } fn start(mode: OpMode) { match mode { OpMode::Client(port, buf_len) => println!("Client: port: {}, buf_len: {}", port, buf_len), OpMode::Server { port, max_conns } => { println!("Server: port: {}, max_conns: {}", port, max_conns) } }; } fn main() { start(OpMode::Server { port: 80, max_conns: 10, }) }
First let's focus on fn main
. We create an instance of OpMode::Server
and initialise its values.
Remember that Server
was using named fields. Now let's have a look at fn start(mode: OpMode)
. We
have got a match expression on mode
. The first arm matches OpMode::Client
but note the values in
the braces - OpMode::Client(port, buf_len)
. Besides matching we are also destructuring the enum.
The data bundled with it is bound to port
and buf_len
(remember it was a tuple with just two
values) and we can use them in the code block of the match arm. It's worth noting that port
and
buf_len
live only in the scope of the match arm.
The second arm matches OpMode::Server
which is using named fields and again we perform matching
and destructuring. We use the short syntax (the bounding variables match the field names) and again
we can use these variables in the code block of the match arm.
One very important and useful thing I haven't mentioned so far is that the match expression needs to be exhaustive or in other words - there should be a match arm for each possible value. This is especially useful for enums. Let's see why. First add a third variant to the enum in the last example:
enum OpMode { Server { port: u32, max_conns: usize }, Client(u32, u32), Nothing, }
And then compile the code. You will get something like:
error[E0004]: non-exhaustive patterns: `OpMode::Nothing` not covered --> src/main.rs:8:11 | 8 | match mode { | ^^^^ pattern `OpMode::Nothing` not covered | note: `OpMode` defined here --> src/main.rs:1:6 | 1 | enum OpMode { | ^^^^^^ ... 4 | Nothing, | ------- not covered = note: the matched value is of type `OpMode` help: ensure that all possible cases are being handled by adding a match arm with a wildcard pattern or an explicit pattern as shown | 12~ }, 13+ OpMode::Nothing => todo!() | For more information about this error, try `rustc --explain E0004`.
The first line says "non-exhaustive patterns: OpMode::Nothing
not covered". The compiler will
generate and error for all the places where we perform pattern matching on the enum and we don't
cover all the variants. This is great! Think about how many bugs you can create by introducing a new
variant and forgetting to handle it properly somewhere? No need to worry about this with match
statements and for this reason I strongly advise using them instead of if/else statements.
Now after you know that match expressions should be exhaustive you probably are asking yourself
"What if I want to match on a u32
or I have got too many cases which I need to handle the same
way?". There is a solution for that. Let's see a simple example where we match on a u32
. The same
technique can be applied on enums and other expressions:
fn main() { let cond = 5; match cond { 0 => println!("cond is zero"), 1 | 2 => println!("cond is 1 or 2"), cond if cond > 2 && cond < 6 => println!("cond is between 2 and 6"), _ => println!("cond is something else"), } }
With the logical OR (|
) we can match on multiple expressions. In this case we can handle 1 and 2
within the same match arm. We can use if
to cover certain values. The syntax may look weird but it
will make sense if you are
destructuring something more complicated like a struct. E.g. OpMode::Client(port, buf_len) if port > 80 => {..}
.
And finally _
is a catch all expression. It will match anything and ignore its value.
Conclusion
Enums, pattern matching and destructuring allow you to write very elegant code. What I really like
is that each time you destructure or pattern match something (of course without using _
) you have
a guarantee that the compiler will notify you on any change (e.g. new field to a structure) which
you might have missed. Compared to the manual search approach this is a huge improvement.
Bundling data with enums also helps with the encapsulations. As a C++ programmer I considered the enums as a glorified constants at best. In Rust you can bundle data with it and make them more useful. Of course this is far from a feature which will make you abandon C++ and start writing in Rust but it is a nice to have.
It's no coincidence that enums are used heavily in Rust std but more on this in the next post.
Comments
Comments powered by Disqus